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Chapter 1: GENERAL INTRODUCTION 

Understanding the factors influencing the distributions and associations of plant and 
animal species is a principal topic for ecologists. The importance of environmental conditions 
in influencing species' distributions and associations is one of the most basic concepts in 
ecological theory. Although studies examining such relationships are simple when restricted 
to one or two species, as the number of species and environmental factors considered 
increases, the complexity of the analyses increases exponentially. As a result, ecologists have 
turned to an array of statistical methods to summarize the patterns in their data. 

The literature shows great variation in both the relative importance attributed to 
specific environmental factors in fish and benthic invertebrate community ecology and the 
forms of statistical analysis in community in general. The following review highlights some 
of the disparate results found within both fields. 



Benthic Invertebrate Community Studies 

Water chemistry has been identified as a major influence in determining the 
composition of benthic communities. Townsend et al. (1987) identified pH as the primary 
factor in community structure, as did Dermott (1985) and Friday (1987). Changes in acidity 
may influence the benthic community directly due to differences in species tolerance (e.g., 
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Harvey and McArdk 1986) or indirectly through changes in predation due to loss of sensitive 

species (Harvey 1982; Nero and Schindler 1983) and changes in food resources due to 
changes in productivity and nutrient cycling of systems (Mackay and Kersey 1985). For 
example, Perry and Sheldon (1986) found that productivity was the principal factor 
determining community structure in streams. Further examples of chemical influences on 
communities include those of: (i) Leland et al. (1986) who demonstrated community changes 
in response to additions of heavy metals; (ii) Stull et al. (1986) who showed recovery of 
marine benthic communities after the quality of municipal effluent was increased, and (iii) 
Hughes and Thomas (1971) who found the salinity gradient to be important. 



Sediment characteristics have been shown to be important in many benthic community 
studies (e.g., Cassie and Michael 1968; Allison and Harvey 1986) whereas Ormerod and 
Edwards (1987) found no relationship between substrate and community composition. 
Predation by fishes has been suggested as important (Bendell and McNichol 1987) as sites 
lacking fish have larger benthic species and increased abundances of many taxa (but see 
Townsend et al. 1987). However, detailed concurrent analyses of fish and benthic community 
structure have been rare, if not totally absent from the literature. Differences in vegetational 
composition and abundance have been found to be important in characterizing benthic 
communities (Glazier and Gooch 1987; Beckett et al. 1992). 
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Given the large number of different factors identified, it is difficult to formulate 

conclusions except that the response by benthic communities is complex and dependent on 

many different factors. Glazier and Gooch (1987) found individual species were associated 

with specific physical, chemical, and vegetational factors. Species assemblages were lacking 

and the authors suggested that strong inter-specific interactions were unimportant in 

determining community structure. In contrast, Winterbourn and Collier (1987) found no 

relationship between invertebrate species composition and environmental variables, but rather 

streams in close proximity had similar communities. Therefore, the availability of suitable 

colonizers set the limits on faunal richness and is important in determining the composition of 

benthic assemblages. 

Clearly, it is apparent that no specific factors can be identified that have universal 
importance in structuring either fish or invertebrate communities. An additional problem 
became obvious during our review of the aquatic community literature. Particularly within 
the invertebrate studies, authors are using a variety of approaches in data analysis, likely with 
little appreciation for the impact such choices have in influencing their results. Decisions 
about the qualitative form of the data (e.g., presence-absence versus abundance), 
standardization (e.g., untransformed, log-transformed, ratios, percentages), measure of 
similarity between sites or taxa, and choice of multivariate presentation (e.g., principal 
components analysis, correspondence analysis, clustering) vary across the studies. In fact, it 
is challenging to find studies employing similar statistical protocols that permit comparisons 
of results from different sites. Although many of these aspects have been considered 
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individually or in limited combinations within the literature (principally with plant 

communities), more detailed comparisons are lacking. 

Multivariate Analysis 

Multivariate methods have a long history of use in comparing ecological sites or 
species associations and have become increasingly popular during the last twenty years (see 
reviews in Gauch 1982a; Legendre and Legendre 1983; Pielou 1984; Digby and Kempton 
1987; Ludwig and Reynolds 1988) due to the availability of computers and associated 
statistical packages. The advantage of the multivariate methods is that they summarize 
patterns within the data and display such summaries in a reduced number of dimensions. 
Depending on the choice of technique, a number of dimensions is displayed which represent 
inter- specific or inter-site relationships while minimizing the distortion in the representation. 
The majority of such multivariate techniques may be grouped into either methods of cluster 
analysis (i.e., classification) or ordination. The assumptions inherent within each group of 
methods differ and, as a result, the choice of a single method may predicate the results. 

Cluster analysis (typically hierarchical) will form groups or clusters from data 
irrespective of whether the data have discrete or continuous distributions. Use of cluster 
analysis with data representing an ecological continuum (i.e., no distinct boundaries or 
discontinuities in the data) may lead to inappropriate interpretations with respect to species 
associations or distributions. As a result, the indiscriminant use of cluster analysis has been 
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increasingly criticized (e.g., Henderson 1985; Jain et al. 1986; Jackson et al. 1989). 

A second group of multivariate techniques comprises methods of ordination analysis. 
This group includes the more commonly used techniques of principal components analysis, 
factor analysis, principal coordinates analysis, nonmetric multidimensional scaling, 
correspondence analysis, and detrended correspondence analysis. Whereas these techniques 
have attributes and assumptions that differ amongst the methods, they do not include the 
assumption of discrete, natural groups existing in the data. Therefore, they are more 
appropriate if the data represent a continuum or if it is unknown whether the data are from a 
continuum or discrete groups. Within this group, there is considerable variation in measures 
of similarity and ordination. However, as indicated above, the importance of these particular 
forms of data analysis is unrecognized often. 

Study Overview 

To address this perceived shortcoming, much of our study examines the relative 
importance of such aspects of data analysis. In Chapter 2, we develop a new method of 
testing for the concordance between data matrices. The method (PROTEST) can be used to 
provide either direct or indirect gradient analysis for ecologists. This method was developed 
as many existing methods, e.g., canonical correlation analysis, have specific assumptions (e.g.. 
linear relationships between variables) which are seldom met in ecological data. 
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Chapter 3 is a comparison of different combinations of data standardization, distance 

measures, and ordination methods. The purpose of this study is to determine the relative 
effect some of our decisions have in contributing to the results of studies. For example, if 
choices of data standardization and distance measure contribute little to variation in our 
interpretation relative to the choice of ordination method, then we may be more confident that 
studies using identical ordinations, but different data treatments are comparable and do not 
simply reflect a set of choices in analysis. 

Chapter 4 examines methods of evaluating the underlying dimensionality of 
multivariate data. Heuristical and statistical methods of assessing the number of non-trivial 
components from principal components analysis are compared. Studies using multivariate 
methods often use one of these approaches or simply interpret an arbitrary number of 
components. The correct assessment of the number of interpretable components is important. 
If too few components are interpreted, valuable information about the community may be 
overlooked. On the other hand, retention of too many components leads to the interpretation 
of relatively meaningless information (i.e., sampling variation or 'noise')- Limited 
comparisons of some assessment methods has been done in the educational and 
psych ometrical literature, but more complete evaluations have not been carried out, nor have 
these methods been evaluated with ecological data. 

Chapter 5 addresses the aspects of community-environment concordance and the inter- 
lake associations based on the different communities. Our work focuses on the distributions 
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and associations of fish species and benthic invertebrates in a set of lakes from south-central 

Ontario. Specific questions considered include: (1) Do fish communities show greater 
association with lake water chemistry or lake morphometric conditions?; (2) Do benthic 
invertebrate communities show greater association with lake water chemistry or lake 
morphometric conditions?; and (3) Do similar inter-lake patterns of association exist in both 
the fish and benthic communities? Decisions regarding data standardization and ordination 
techniques used in this chapter were based on information gained from Chapters 2-4. The 
overall test of concordance between the two communities was evaluated using PROTEST 
(Chapter 2). 
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Chapter 2: PROTEST: A PROCRUSTEAN RANDOMIZATION TEST OF 

COMMUNITY-ENVIRONMENT CONCORDANCE 

ABSTRACT 

A multivariate measure of the concordance or association between matrices of species 
abundances and environmental variables was generally lacking in ecology until recently. 
Traditional statistical procedures comparing such relationships are often unsuitable because of 
non-linearity among species and/or environmental data. To address these problems, a 
randomization test based on Procrustes analysis is proposed. One matrix is subject to 
reflection, rigid rotation, translation, and dilation to minimize the residual deviations between 
points for each observation and the identical observation in the target matrix. This is a 
classical Procrustes approach to matrix analysis. To assess the significance of this measure of 
matrix concordance, a randomization test is used to determine whether the sum of residual 
deviations is less than that expected by chance. The PROcrustean randomization TEST 
(PROTEST) may be used with either raw data matrices or with multivariate summaries of the 
original data (i.e., both direct or indirect gradient analysis). Examples are provided of 
PROTEST analyses with benthic invertebrate communities, lake-water chemistry, lake 
morphology, and lake geographic position. Significant concordance between the benthic 
community and both lake water chemistry and geographic position were found. PROTEST 
results differed from Mantel test results as the choice of distance measure with Mantel tests 
will influence the level of significance obtained. 
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INTRODUCTION 

Assessing the degree of concordance between data matrices has proven a vexing 
problem in many fields. The inherent multidimensionality of psychological and biological 
data has resulted in the development of numerous statistical procedures to assess the similarity 
or redundancy between data sets (e.g., see Gauch 1982a; Pielou 1984; Legendre and Legendre 
1983; Jongman et al. 1987). One complication is the fact that such relationships are often 
nonlinear even though most statistical methods implicitly assume linear relationships between 
variables (e.g., Austin 1984). Although similar problems exist in many fields, the examples 
used are from community ecology to demonstrate an alternative technique for testing 
concordance between data matrices. 

In the simplest approach to assess concordance between data sets, ecologists calculate 
tables of bivariate correlations (direct gradient analysis), t-tests or other statistics representing 
all possible combinations of species and environmental variables (e.g., Evans and Waring 
1987; Henderson and Fry 1987; Marshall and Ryan 1987). An obvious drawback to these 
tables is the increased likelihood of committing a Type I error. Typically, dozens or hundreds 
of pairwise correlations are reported, frequently with little recognition of the statistical 
problem associated with multiple comparisons (i.e., use of a Bonferroni correction). 

A more common approach is the use of multivariate techniques (e.g., principal 
components analysis [PCA] or correspondence analysis [CA]) to summarize variation in the 
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species data (i.e., the response matrix). The resulting subset of axes is then correlated with 

the environmental variables in what is commonly called indirect gradient analysis (e.g., 

Whittaker 1973; Gauch 1982a). Unfortunately, even indirect gradient analysis can result in 

large numbers of correlations (see Digby and Kempton 1987; Jongman et al. 1987). 

Numerous other methods have been proposed to examine concordance between 
matrices. One such approach is canonical correlation analysis (CCA) and the related 
redundancy analysis (Carleton 1984; Gittins 1985; Corkum and Ciborowski 1988). Although 
these methods are mathematically and conceptually simple, they are often criticized for 
assuming linear relationships between variables within each matrix and between matrices. As 
a result, ecologists infrequently use these methods (but see Gittins 1985). Recent 
developments using nonlinear models in CCA may address such criticisms (van der Burg and 
de Leeuw 1983; van der Burg 1988). 

Another approach is canonical correspondence analysis (CANOCO; ter Braak 1986, 
1987a,b). This method uses CA (or the variant called detrended correspondence analysis 
[DCA]; Hill and Gauch 1980) to ordinate community data, thereby better representing 
nonlinear relationships between species data (Gauch 1982a; Peet et al. 1989). However, 
CANOCO-based approaches still assume linear relationships between environmental variables 
(ter Braak 1987b). If the environmental data do not meet this requirement, which is 
frequently the situation (e.g., Mclsaac et al. 1987), then CANOCO results may be affected 
adversely. 
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An alternative test to assess the degree of association between two matrices is the 

Mantel test (Mantel 1967; Dietz 1983; Legendre and Fortin 1989; also called Quadratic 
Assignment; Hubert and Schultz 1976; Dow and Cheverud 1985). This test is an extension of 
a simple correlation between two inter-point distance matrices (e.g., the cophenetic correlation 
coefficient; Sokal and Rohlf 1962). Previously, the statistical significance of such correlations 
was not available because distances within a given matrix are not independent (Gower 1971; 
Dietz 1983; Jackson and Somers 1989). The Mantel test overcomes this limitation and 
evaluates inter-matrix correlations using randomization tests (Edgington 1987; Manly 1991). 
Randomization tests permit one to determine whether an observed statistical measure is more 
extreme than expected by chance (see details in methods). The Mantel test and an associated 
analysis-of-variance-based technique (Legendre et al. 1990) are recent proposals to examine 
the role of spatial autocorrelation in ecology (i.e., the influence of location on ecological 
comparisons; e.g., Sokal 1979; Legendre and Legendre 1983; Burrough 1987; Jumars et al. 
1987; Legendre and Fortin 1989). 

Another recent development in inter-matrix comparison is Procrustes analysis (Gower 
1971,1975; Schonemann and Carroll 1970; Digby and Kempton 1987), also known as 
orthogonal mapping (Olshan et al. 1982), generalized rotational fit (Siegel 1982; Rohlf 1990; 
Rohlf and Slice 1990), and least-squares theta-rho-analysis (Chapman 1990). Rather than 
comparing inter-observation distance matrices as in the Mantel test, Procrustes analysis 
minimizes the sum -of- squared deviations between data values in two observation-by-variable 
matrices through matrix translation, reflection, rigid rotation, and dilation. This is an 
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extension of the comparative bi-orthogonal grid method proposed by Sneath (1967). In the 

Procrustean approach, each observation is represented by a point X, (i = 1 n) in one matrix 

and a corresponding point Y, in the second matrix. The two configurations of points are then 
matched to maximize their fit using rotation, translation, reflection, and dilation of matrix X 
resulting in X. The criterion used to assess best fit is the minimization of the residual sum of 
squares between the points for each observation: 

J-i 
where AX,-,y,- is the distance between points X, and Y,. 

To demonstrate the basic approach of Procrustes analysis, two triangles are simulated 
that differ in orientation, size, and location (Fig. 2.1). With the Procrustean rotation, the first 
triangle (A-B-C) is repositioned (a-b-c) to maximize the concordance with the second 'target' 
triangle (X-Y-Z). The sum-of-squared residuals between a-X, b-Y, and c-Z has been 
minimized. Therefore, the rotated matrix (a-b-c) provides the closest fit to the target matrix 
(X-Y-Z). 
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Figure 2.1. Example of Procrustean rotational fit. Triangle ABC is rotated to new 
coordinates [a-b-c) to minimize the sum-of- squared residuals between the rotated triangle and 



X-Y-Z. 
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Gower (1975) advocated future work to develop and extend Procrustean-type measures 

in an analogue of the analysis of variance. Unfortunately, little progress has been made in 

this direction. Instead recent work has focused on the use of Procrustes analysis to calculate 

distance statistics in the comparative analysis of landmark data (Chapman 1990; Rohlf 1990), 

although this particular application in morphometries is challenged (Bookstein 1990). Digby 

and Kempton (1987), Gower (1987) and Rohlf and Slice (1990) provide detailed discussions 

of both Procrustes analysis and generalized Procrustes analysis (for comparing several 

matrices simultaneously). 

Procrustean methods are used infrequendy in ecology. This lack of use likely reflects 
the limited availability of the procedure (GENSTAT is the only major statistical package 
providing Procrustes analysis, although programs are also available in Siegel [1982] and Rohlf 
[1990]). Fasham and Foxton (1979) used Procrustes analysis to determine the relative 
importance of various environmental factors influencing the vertical distributions of pelagic 
decapoda. Kenkel and Bradfield (1986) examined the relationship between epiphytic 
vegetation and environmental conditions. Jackson and Somers (1991) used the technique to 
compare analytical results from various ordinations of community data. In addition, Rising 
and Somers (1989) compared different morphometric analyses of birds using Procrustean 
analyses. 

In this paper, a test is described that determines whether the degree of concordance 
between two matrices is greater than expected given random inter-matrix association. A 
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simple derivation of this test (PROTEST) is described and its use shown with several 

ecological data matrices. With these examples, its suitability is demonstrated for both indirect 
(i.e., multivariate data summaries) and direct gradient analysis (i.e., using the original raw 
data). As well, PROTEST is contrasted with the Mantel test approach. 



METHODS 

PROTEST: PROcrustean Randomization TEST 

In Procrustes analysis, a pair of data matrices is compared by using a rotational -fit 
algorithm that minimizes the sum-of-the-squarcd residuals between the two matrices (i.e., m 2 
statistic; Gower 1971, 1975; Digby and Kempton 1987; Rohlf and Slice 1990). Residuals 
between the original values and the best-fit solution are calculated for each location (or 
observation) to identify outlying and deviant points. The resultant m 2 value is a goodness-of- 
fit statistic that describes the degree of concordance between two matrices (i.e., how close the 
two data configurations match?). Throughout this study, the standardized form of Procmstes 
analysis is used where all variables are standardized prior to the analysis, to maintain 
symmetry. Using this approach, it does not matter which matrix is rotated or which is the 
target because the resulting m 2 statistics are the same. When the standardized form of the 
analysis is not used, one matrix must be designated a priori as the target matrix. 
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To evaluate the significance of an observed m 2 statistic, a randomization test is 

developed (Edgington 1987; Manly 1991). Randomization tests estimate the significance of 
an observed statistic relative to a large number of values for the same statistic that are 
generated by randomly permuting the original data (i.e., by randomly reshuffling the data). 
This approach estimates the probability of observing a given statistic even though the 
moments for that statistic may be algebraically intractable. In this application, each site label 
from one matrix is permuted randomly in such a way that each site can assume any of the 
possible values for a given variable, but values are not inter-changed among variables nor is 
the within-matrix covariance structure changed. The concordance between the resultant 
randomized matrix and the original target matrix is calculated. This randomization procedure 
is repeated a large number of times to derive a distribution of values representing random 
association between the matrices. This test is a one-tailed test that counts the number of 
random nt 2 statistics that have a residual sum of squares smaller than or equal to the observed 
m 2 value. For example, if 1999 randomizations are completed and 9 sums of squares are less 
than or equal to the observed m 2 statistic, then the resultant probability of the observed m 2 
value is ( 1 +9)/( 1 + 1 999) = .005 where the 1 in both the numerator and denominator is the 
observed m 2 value. Further descriptions and examples of randomization methods are provided 
by Edgington (1987) and Manly (1991). 
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Ecological Data 

Nineteen lakes in south-central Ontario were surveyed using a standard protocol to 
measure benthic invertebrate abundance, lake-water chemistry, lake morphology and 
geographic location (Harvey and Lee 1981). The resultant data matrices for benthos, 
morphology, and water chemistry were each ordinated to produce two-dimensional 
multivariate summaries (see below). Only the first two axes were retained from each solution 
because these axes summarize the greatest amount of variation and are easily compared to the 
geographic location (i.e., latitude and longitude) of the lakes. 

Counts of the benthic invertebrates represented 16 major taxonomic groups. These 
counts were log(x+l) transformed and analyzed with CA because CA is better able to 
represent longer ecological gradients than methods such as PCA (Gauch 1982a). 

The lake morphology data matrix, consisting of lake surface area, volume, shoreline 
perimeter, and maximum depth, was analyzed using a PCA of the correlation matrix of log- 
transformed variables. As these variables showed strong linear bivariate relationships, a PCA 
was chosen as the most appropriate ordination technique to summarize variation in two 
dimensions. 

Lake water chemistry included lake pH, alkalinity, calcium, magnesium, potassium, 
chloride, sulphate, dissolved organic carbon, dissolved inorganic carbon, phosphorus, nitrate, 
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and conductivity. These chemistry variables, excluding pH, were log(x+l) transformed and 

analyzed with CA because of curvilinear inter-variable relationships. CA was chosen because 

it is more robust in representing such data relationships. The first two axes from each of the 

three matrices plus lake geographic location were then compared using PROTEST. 

Although equivalent rank may be achieved by the addition of vectors of zeroes (ten 
Berge and Knol 1984), such modifications may lead to problems with the standardization of 
the matrix necessary for symmetrical m 2 solutions. Therefore, in direct gradient analysis, 
PROTEST may be restricted to tests involving matrices of equal rank if the symmetrical 
result is preferred. Alternatively, specific directed hypotheses are possible for direct gradient 
analysis of matrices of unequal rank (e.g., "Is the benthos influenced by the water 
chemistry?", but not "Is the water chemistry influenced by the benthos?") when target 
matrices are identified a priori. 
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RESULTS AND DISCUSSION 

The spatial arrangement of the nineteen lakes on the first two axes from the ordination 
of the benthic invertebrate abundances show marked similarity to the observed geographic 
positions (Fig. 2.2a,b). The same groupings of lakes (e.g., lakes A,B,C,D, and E; sec Table 
2. 1 for lake names and codes) are apparent in both plots. The fit of the two spatial 
arrangements using PROTEST is significant (m 2 = 0.722; P = .02; Fig. 2.3). This similarity 
is further emphasized by the best-fit arrangement of the lakes (Fig. 2.2c). The deviations are 
displayed in a plot of the residuals between the best-fit matrix and the target matrix (i.e., the 
geographic matrix; Fig. 2.2d). Deviations of individual points are indicated by the line 
segment between the best-fit position of a given lake and the target matrix position of that 
lake (i.e. the longer the line, the greater the residual between the best-fit and target points). 
Lakes M and H exhibit the largest residuals, although residuals for lakes F and Q are nearly 
as large. These 4 points represent lakes with atypical water-chemistry conditions, such as 
acidified lakes. 

The relatively strong concordance between the benthos ordination and lake geographic 
position indicates that the invertebrate communities exhibit a strong biogeographical (i.e., 
spatially correlated) pattern. Given the watershed arrangement of these lakes, such a pattern 
is not surprising. Adjacent lakes are expected to have more in common biologically than 
distant lakes. Due to the drainage patterns and geological similarities of the drainage basins 
of the lakes, nearby lakes are also expected to have similar chemical conditions. However, 



Table 2.1. Codes and lake names. 
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Code 


Lake 


Code 


Lake 


A 


Solitaire 


B 


Buck 


C 


Walker 


D 


Little Clear 


E 


Harp 


F 


Clear 


C. 


Heeney 


H 


Plastic 


I 


Gullfeather 


J 


Leech 


K 


Dickie 


L 


Red Chalk 


M 


Basshaunt 


N 


Chub 


O 


Blue Chalk 


P 


Bigwind 


Q 


Fawn 


R 


Crosson 


s 


Cinder 
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Figure 2.2a. First two axes from a correspondence analysis of benthic invertebrate data. 
Letters represent lakes (i.e.. observations) used in data analysis. Lake names associated with 
codes are listed in Table 2.1. 
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Figure 2.2b. Plot of lake position in geographic space. Letters correspond to lakes from 
Figure 2.2a. 
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Figure 2.2c. Best-fit configuration of benthic invertebrate data from Figure 2.2a rotated to 
matrix of geographic position. 
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Figure 2.2d. Line segments are residuals between the rotated best-fit invertebrate matrix and 
the geographic position of lakes. Letters indicate geographic position of lakes and opposing 
end of line segments is best-fit matrix. 
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Figure 2.3. Distribution of m 2 statistic based on 10,000 randomized matrices of correspondence analysis results rotated to the 

geographic matrix. 
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the PROTEST results show a lack of concordance between the lake-water chemistry 

ordination and geographic location (m 2 = 0.805; P = .110). Similarly, there is no concordance 
between the ordination of lake morphology and geographic location (m 2 = 0.796; P = .108). 

A comparison of the zoobenthic ordination and the water-chemistry ordination shows 
significant concordance (m 2 = 0.720; P = 0.021). This result suggests that water chemistry 
may affect the structure of the invertebrate community. Environmental conditions will affect 
the invertebrate community composition cither directly via physiological mechanisms (e.g., 
HalJ 1990) or indirectly through changes in fish predation (Bendell and McNichol 1987; 
Glazier and Gooch 1987; Post and Cucin 1984; Run et al. 1990). In contrast, the zoobenthos 
is not associated significantly with the lake morphology ordination (m 2 = 0.858; P = 0.330). 

PROTEST may be used to test for multivariate concordance between community-based 
ordinations or the original species-composition data and the original environmental variables 
(i.e., direct gradient analysis). Note that neither PROTEST nor the Mantel Test may be used 
to test the relationship between a data set and the results of an analysis on that data set. 

To demonstrate PROTEST in direct gradient analysis, the original abundances of 16 
benthic taxa were contrasted with a matrix of combined water-chemistry and lake- 
morphometry variables. The combined matrix of environmental variables was chosen to 
maintain matrices of equal rank for the PROTEST analysis. PROTEST provided an rrc value 
of 0.590 (P = 0.845) between the benthic invertebrate counts and the combined water- 
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chemistry and morphometry data. This result showed a much lower degree of concordance 

than that resulting from indirect gradient approaches using the first two CA axes from benthic 
invertebrates with either lake morphology or lake chemistry. This result may arise because: 
(1) The original benthic and environmental data contain large amounts of 'noise' which 
confound community patterns. The multivariate ordination axes do not retain this 'noise', 
thereby providing a clearer representation of the community and the environment (Gauch 
1982b); (2) The lack of concordance between multivariate solutions based on lake chemistry 
and morphology data (m* = 0.806; P = .141) indicates that these two data sets exhibit 
different patterns. Therefore, the combination of both environmental data sets may provide 
discordant results with the invertebrate data; (3) Alternatively PROTEST does not adequately 
measure matrix concordance in this application. Results from simulated data, field data, and 
morphometric data indicate that PROTEST is a suitable measure of inter-matrix concordance. 
Other studies using Procrustean methods indicate that Procrustes analysis is well-suited for 
assessing concordance between matrices (e.g., Gower 1971,1975; Rohlf 1990; Rohlf and Slice 
1990). 

Results using PROTEST are in sharp contrast to those obtained from the Mantel test. 
Mantel results were obtained using the same axes used in PROTEST. Distance matrices were 
constructed using Euclidean distances between points. Only the comparison between lake- 
water chemistry and geographic position provided significant results (P = .011; Table 2.2). 
Whereas PROTEST results showed significant concordance between the benthic ordination 
results and geographic position, no such association was found using the Mantel test {P = 
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.140). Similarly, Mantel results indicated no association between the ordinations of benthic 

invertebrates and water chemistry (P = .353), although the PROTEST results were 
significantly different from random. 

One of the strengths and weaknesses of the Mantel approach is due to the large 
potential number of choices in measures for constructing distance matrices. Given the large 
number of distance measures available (e.g., Legendre and Legendre 1983), a careful 
researcher may choose a priori a measure to emphasize particular attributes believed 
important. For example, the use of matrices of inverse distances in the Mantel test 
emphasizes local or small-scale relationships while down-weighting large-scale relationships 
(e.g., Jackson and Harvey 1989). Using inverse-distance matrices, there was a significant 
association between water chemistry and geography (P = .008; Table 2.2) and weaker 
associations between water chemistry and lake morphology (P = .083) and between the 
benthos and geographic position (P = .106). However, such choices in distance matrices are 
not without their drawbacks. As with methods such as cluster analysis, one may be tempted 
to test many different distance measures until the desired result is found. PROTEST may 
also capitalize on the numerous methods available to summarize multivariate patterns. In 
fact, PROTEST results using standardized matrices are equivalent to a scalar multiple of the 
Mantel statistics based on a squared Euclidean distance (C.J.F. ter Braak, pers. comm.). 
Alternatively, one may also choose to avoid such subjective decisions by using PROTEST on 
the original data matrices, thereby avoiding intermediate stages of calculating distance 
matrices and ordinations. 
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Table 2.2, Probabilities of obtaining equal or more extreme Mantel statistics with the 

distance matrices. Upper triangle based on matrices of Euclidean distances between points 

and lower triangle based on matrices of the inverse of Euclidean distances. 



Benthos Chemistry Morphology Geography 

Benthos .353 .243 .140 

Chemistry .380 .448 .011 

Morphology .342 .083 .497 

Geography .106 .008 .120 
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PROTEST provides an objective means to evaluate the concordance of multivariate 

patterns. The conceptual simplicity of Procrustes analysis, coupled with permutation methods, 
provides an alternative to present approaches. Procrustean methods will out-perform 
traditional methods because PROTEST makes no implicit assumptions regarding the inter- 
variable relationships (e.g., linearity, bivariate normality, homogeneity of variance and 
covariance). As a result, Procrustes-type approaches hold considerable promise for future 
developments in light of the frequent disappointments with the application of traditional 
multivariate methods to ecological data sets (e.g., James and McCulloch 1990). 
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Chapter 3: MULTIVARIATE ANALYSIS OF BENTHIC INVERTEBRATE 
COMMUNITIES: THE IMPLICATION OF CHOOSING PARTICULAR DATA 
STANDARDIZATIONS, MEASURES OF ASSOCIATION, AND ORDINATION 
METHODS 



ABSTRACT 

Benthic invertebrate data from thirty-nine lakes in south-central Ontario were analyzed to 
determine the effect of choosing particular data standardizations, resemblance measures, and 
ordination methods on the resultant multivariate summaries. Logarithmic-transformed, 0-1 
scaled, and ranked data were used as standardized variables with resemblance measures of 
Bray-Curtis, Euclidean distance, cosine distance, correlation, covariance and chi-squared 
distance. Combinations of these measures and standardizations were used in principal 
components analysis, principal coordinates analysis, nonmetric multidimensional scaling, 
correspondence analysis, and detrended correspondence analysis. Correspondence analysis 
and principal components analysis using a correlation coefficient provided the most consistent 
results irrespective of the choice in data standardization. Other approaches using detrended 
correspondence analysis, principal components analysis, principal coordinates analysis, and 
nonmetric multidimensional scaling provided less consistent results. These latter three 
methods produced similar results when the abundance data were replaced with ranks or 
standardized to a 0-1 range. The log-transformed data produced the least consistent results. 
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whereas ranked data were most consistent. Resemblance measures such as the Bray-Curtis 

and correlation coefficient provided more consistent solutions than measures such as 
Euclidean distance or the covariance matrix when different data standardizations were used. 
The cosine distance based on standardized data provided results comparable to the CA and 
DCA solutions. Overall, CA proved most robust as it demonstrated high consistency 
irrespective of the data standardizations. The strong influence of data standardization with the 
other ordination methods emphasizes the importance of this frequently neglected stage of data 
analysis. 



INTRODUCTION 

With increasing frequency, benthic invertebrate communities are used as indicators of 
environmental degradation or restoration (e.g., Cairns and Dickson 1971; Hruby 1987; Clarke 
and Green 1988) because the benthos broadly reflect environmental conditions. The 
complexity of benthic communities has led many researchers to adopt multivariate approaches 
to summarize patterns of species' abundance and co-occurrence. Multivariate methods are 
viewed as 'objective' techniques allowing greater understanding of the structure of the 
community and relationships with corresponding environmental conditions (e.g., Smith et al. 
1988; Burd et al. 1990). However, the occurrence of multiple and discordant, but equally 
'objective' solutions from the same data is rarely acknowledged (e.g., see Beals 1973; James 
and McCulloch 1990). These differences are due to subjective choices surrounding data 
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standardization, the measure of similarity, and the type of ordination method. 

Many approaches can be undertaken to analyze community data. A common one is to 
select a method of data analysis, often based on past experience, and assume that the resultant 
summary adequately models the underlying structure of the data. This rather subjective 
approach is quite common given that most published studies present a single solution and 
rarely identify whether other analyses were done. An alternative is to use several different 
measures of similarity or distance and a variety of ordination or clustering methods and then 
compare the results (e.g., Green 1979, Digby and Kempton 1987; Hruby 1987; Jackson et al. 
1989; Rohlf 1990a). This comparative approach provides qualitative evidence of the robust 
nature of the results. If one finds similar patterns using a variety of methods, then the results 
should be fairly representative (Green 1979). Patterns differing greatly amongst various 
analyses suggest that the underlying community structure is poorly defined. Assuming that 
one concludes, based on some arbitrary comparison, that the results are sufficiently similar, it 
remains a subjective choice as to which solution is presented. Often this choice is based on 
which solution is 'most interpretable' with regard to, hopefully, a priori hypotheses. 

Amongst the studies published regarding benthic invertebrates, great variation exists in 
the importance attributed to environmental influences on community structure. Water- 
chemistry characteristics such as pH (Harvey and McArdle 1986; Townsend et al 1987), 
heavy metal contamination (Leland et al. 1986), municipal effluent (Stull et al. 1986), and 
salinity (Hughes and Thomas 1971) have been suggested as factors affecting the benthic 
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community composition. Sediment characteristics arc frequently important (e.g., Cassie and 

Michael 1968; Allison and Harvey 1988), although Ormerod and Edwards (1987) found 

sediments to be unimportant in addition, biotic factors, such as fish predation, may (Bendell 

and McNichol 1987) or may not (Townsend et al. 1987) be important. Given the variety of 

findings in benthic studies, one might conclude that there are no environmental factors which 

are important in all studies, even within a particular region. Although there is considerable 

variation from one study to another, we cannot exclude the possibility that the patterns 

reported in the literature may arise largely from different methods of analysis. 

Benthic studies have used a variety of analytical methods with limited comparisons 
between these approaches. Historically most studies used principal components analysis 
(PCA; e.g., Cassie and Michael 1968; Hughes and Thomas 1971), and the method remains 
popular (e.g., Glazier and Gooch 1987; Smith and Pearson 1987). However, criticism 
regarding the use of PCA with data exhibiting nonlinear relationships between species (e.g., 
Noy-Meir and Austin 1970; Austin 1976a,b; Sheldon and Haick 1981) has prompted the use 
of other methods such as principal coordinates analysis (PCoA; Stephenson and Williams 
1971; Stull et al. 1986), nonmetric multidimensional scaling (NMDS; King et al. 1988), 
correspondence analysis (CA), and detrended correspondence analysis (DCA; Furse et al. 
1984; Wright et al. 1984; Bunn et al. 1986; Townsend et al. 1987). Given the possible 
choices amongst these ordination methods, it is difficult to determine how much of the 
variation attributed to environmental factors is actually due to the choice of ordination. 
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Other steps in data analysis may contribute to differing results as well. The first 

choice in any analysis is that of data transformation and standardization. Ecological data 
typically do not meet the assumptions of normality and homogeneity of variance (e.g., Elliott 
1977; Downing 1979). Therefore, the data are transformed to approximate these conditions. 
However, the literature shows a wide range in the choice of transformation. Leland et al. 
(1986), Bradt and Berg (1987), Townsend et al. (1987) used untransformed data; Hughes and 
Thomas (1971), Hruby (1987), Graca et al. (1989), and Doeg et al. (1989) used log- 
transformed data; Clarke and Green (1988) employed a fourth-root transformation; Corkum 
and Ciborowski (1988) used an octave transformation; and numerous other forms and 
examples exist (e.g., see Elliott 1979). In addition, data standardization is employed to 
change the relative weight that each variable or sample contributes to the analysis. If the 
absolute abundance of species is considered important then the analysis should emphasize the 
absolute abundance and use unstandardized data. If interest is focused on relative abundance, 
then some type of standardization is necessary. The following include a variety of choices 
previously used: (1) subtracting the mean from each datum of a particular taxon (Smith et al. 
1988); (2) subtracting the mean and dividing by the standard deviation (i.e., z-scores; Cassie 
and Michael 1968); (3) dividing each taxon by its maximum value resulting in data having a 
0-1 scale (Legendre and Legendre 1983); (4) converting to ranks (Green 1979; Glazier and 
Gooch 1987); or using proportions or percentages (Orrnerod 1988; Rutt et al. 1989; Rabeni 
and Gibbs 1980). 
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Similarly, the choice of the measure of distance or resemblance between the taxa or 
sampling sites also varies amongst studies. Commonly used methods include: the Bray-Curtis 
coefficient (e.g., Stephenson and Williams 1971; King 1981; Mitchie 1982; Smith et al. 1988; 
Graca et al 1989); Euclidean distance (Glazier and Gooch 1987; Hruby 1987); simple 
product- moment correlations (Hughes and Thomas 1971; Jenkins et al. 1984); and the chi- 
squared distance implicit in CA and DCA (Bunn et al. 1986; Leland et al. 1986; Townsend et 
al. 1987). Further discussion of resemblance measures may be found in Orloci (1978); Green 
(1980); Legendre and Legendre (1983), Pielou (1984), and Digby and Kempton (1987). 

Clearly there are many possible choices in community-type data analyses and given 
the plethora of possibilities, few studies have employed the same analytical approaches. 
However, one important consideration in any community study is the consistency of the 
results from various combinations of data standardization, distance measure and ordination 
method. For example, if one ordination procedure appears relatively insensitive to changes in 
data standardization or distance measure, then the ordination results are fairly consistent. This 
is particularly important when comparing published results using different data 
standardizations and distance measures, but identical methods of ordination. On the other 
hand, one may find that the data standardization produces results that dominate all other 
choices in the analysis. The following study examines the relative importance of choosing 
among the various options from data standardization, distance measure, and ordination to 
provide recommendations for future benthic-community studies. 
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METHODS 

Data Collection 

Thirty-nine lakes in south-central Ontario were sampled to examine the community 
structure of the benthic invertebrates. SCUBA divers collected ten 5.21 cm diameter cores 
from each of six transects per lake and these cores were preserved immediately in 10% 
formalin. Subsequently, invertebrates were separated from the sediment by floatation in a 
sugar solution in the laboratory (Allison and Harvey 1981). The animals were identified into 
thirty-two taxonomic groups, and the numerical abundance (i.e., counts) of each taxon was 
recorded. Identification was to the familial and ordinal level following Allison and Harvey 
(1981) and given that Warwick (1988) found higher levels of identification to be 
advantageous. Warwick indicated that identification below the familial level provided no 
additional information relative to more resolved taxonomic levels and even data at the phylum 
level led to only minor loss of information with respect to site-to-site discrimination. 

Only cores from the epilimnetic zone were used to examine community structure in 
this comparison. The epilimnion was defined as ranging from the lake surface to one metre 
above the thermocline in lakes that stratified thermally. All sixty cores were included for 
lakes exhibiting no thermal stratification. The thirteen most frequendy occurring taxa were 
retained for subsequent analysis to generate a 3:1 ratio of observations (i.e., lakes) to 
variables (i.e., taxa). This 3:1 ratio of samples to variables is recommended in multivariate 
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analysis to increase the stability of the resultant ordination solutions (Grossman et al. 1991). 

In addition, only these thirteen taxa occurred in more than 50% of the lakes. As a result, no 

rare taxa were included in the analysis, thereby minimizing problems frequentiy encountered 

in the analysis of data matrices containing large numbers of zeroes (see Legendre and 

Legendre 1983 for details). The abundance data were strongly skewed so all variables were 

log(x+l) transformed prior to calculating mean abundance per taxon per lake. These 

logarithmic means for each taxon in each lake formed the basic data for all subsequent 

analyses. 

Data Analysis 

As stated above, the initial data base consisted of geometric mean counts for each of 
the thirteen taxa in the thirty-nine lakes. As a result, the 'raw' data were epilimnetic averages 
of the original logarithmically transformed counts of invertebrate abundances. To remove or 
reduce the influence of the absolute abundance, the data were standardized by dividing each 
value by the maximum value for that taxon across all lakes. This produced data ranging from 
0-1. This standardization was chosen in favour of z-scores (i.e., a mean of zero and standard 
deviation of one) because ordination methods like CA and DCA require all data to be greater 
than or equal to zero. For comparison, the data also were ranked across lakes to give equal 
weighting to each taxon. Proportions or percentage standardizations were not used because 
these transformations induce statistical characteristics which strongly alter inter-variable 
relationships (Aitchison 1984; Jackson, unpubl. data). 
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Similarity/Distance Measures 

Distance matrices were calculated from each of the three data matrices. The Euclidean 
distance measure was chosen because it does not involve any standardization and thereby uses 
absolute rather than relative abundance of species. Euclidean distance is calculated as 
follows: 



Mil = 



E <*«-*«> a 



jt-i 



where sites i and j are compared for n possible species. Euclidean distance is influenced 
strongly by the absolute magnitude of species' abundances and the correlation between 
species. The coefficient is unbounded at its upper range (i.e., the maximum value depends on 
the absolute abundance of the data being compared). 

The second coefficient was the Bray-Curtis measure because of its frequent use in 
benthic-invertebrate and plant community studies (see Introduction). The measure includes an 
implicit standardization down-weighting the most abundant taxa in the following form: 



E !*«-*« 

BC„ = J=l 



E {x ki+ x kj ] 

Jc-l 
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The third measure was the cosine of the angle between the multidimensional vectors 

for each lake. The cosine coefficient is not commonly used by community ecologists (e.g., 

Smith et al. 1990), but it is monotonic to standardized measures like the chord distance 

(Orloci 1978). Cosine distance has been used in morphometric analyses because it is 

invariant with respect to body size or in the case of this study, it is invariant with respect to 

the magnitude of species abundance. Calculations for the cosine distance are determined 

using: 

a 



11 11 

\ *-l Jr-l 



Correlation and covariance matrices were calculated also. The standard product- 
moment correlation implicidy standardizes the data for differences in absolute abundance. In 
contrast, the covariance does not employ a standardization and, therefore is equivalent to the 
Euclidean distance measure with respect to data standardization. As a result, the correlation 
matrix is influenced less by the magnitude of species abundance than the covariance matrix. 
Differences between the unstandardized covariance measure and the standardized correlation 
coefficient are apparent in the following formulae: 



COV ±j = -i-£ [( Xl -x u ) (Xj-Xy 

11 - 1 Jr-l 



)] 
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COR ±j = - 



£ [Uj-xjj Uj-x^n 



£ (x i- x *j) 2 E (x j- x « )2 

The final measure was the chi-squared distance. This measure has been used 
explicitly as a distance measure for cluster analysis and ordination (e.g., Pontasch and 
Brusven 1988) or implicitly in CA and DCA (although frequently without being recognized). 
The chi-squared measure incorporates a standardization which emphasizes relative species 
abundances rather than absolute abundance. However, this measure is susceptible to over- 
emphasizing rare species in sites with low taxonomic richness. The distance measure is 
calculated as: 



x 2 « = 



E 



X .i X .J 



Jr-i X k. 



Ordination Methods 

Different ordination methods were used in conjunction with the three data 
standardizations and six types of distance measures. Principal coordinates analysis or metric 
multidimensional scaling was used with distance matrices based on Bray-Curtis, Euclidean, 
and cosine measures for each of the three standardized data matrices (i.e., a total of nine 
combinations for each ordination method). These same distance matrices were used in 
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nonmetric multidimensional scaling. Both three-dimensional random configurations and 

PCoA solutions were used as initial input configurations for NMDS. In all cases, the random 
and PCoA solutions led to similarly stable NMDS solutions (i.e., no change in stress values 
with a minimum of 50 additional iterations) and the results based on PCoA-initial 
configurations are presented only. Initial configurations based on metric solutions are 
recommended, particularly to avoid the problem of local minima (Ludwig and Reynolds 1988; 
although this is frequently debated, e.g., Kenkel and Orloci 1986). Principal component 
analysis was used to analyze the correlation and covariance matrices arising from each of the 
three data standardizations (i.e., a total of six separate analyses). Correspondence analysis 
and the related detrended correspondence analysis were used with each form of data 
standardization and the implicit chi-squared measure of similarity (only three analyses per 
ordination method). PCoA, NMDS, and PCA analyses were completed using the NTSYS/PC 
package (Rohlf 1989). CANOCO (ter Braak 1987) was used for the CA and DCA using the 
default settings for detrending by segments and ignoring the potential problems of multiple 
DCA solutions (Jackson and Somers 1991). 

Comparing Ordination Solutions 

The first three dimensions from each ordination solution were retained for subsequent 
comparisons because these axes summarize the greatest proportion of variance in each 
analysis and three axes is the limit to the number of axes that can be compared visually. The 
resultant three-dimensional ordinations of the thirty-nine lakes were compared using bivariate 
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and multivariate summaries. Scores from the first axis of each analysis were contrasted using 

Spearman rank correlations. Similarly, scores on the second and third axes were correlated 

between analyses. This approach examined similarities or differences in the rank ordering of 

the positions of points (i.e., lakes) on the three ordination axes. To assess the role of overall 

benthic abundance on the ordinations, the sum of the geometric means of the taxa within each 

lake was correlated with the scores for each lake on each multivariate axis. 

Because configurations of points in one ordination simply might be rotated to lie on 
other axes in a different ordination, a multivariate summary of the similarity between 
ordinations is preferable as an overall measure of ordination concordance. Procrustes analysis 
is the preferred method for such comparisons (Gower 1971; Minchin 1987; Digby and 
Kempton 1987; Jackson and Somers 1991) where the goodness-of-fit measure between these 
matrices is the m 2 sum-of-squared deviations statistic (Gower 1971,1975). This m 2 value is a 
metric measure of concordance when both matrices are standardized prior to the Procrustean 
rotation. The m 2 statistic was calculated between each pair of three-dimensional ordination 
solutions to produce a matrix of m 2 distances between all thirty standardization-distance- 
ordination combinations. This resultant 30-by-30 matrix of m 2 distances was ordinated using 
PCoA, permitting a graphical assessment of the multidimensional similarity of the thirty sets 
of ordination results. A minimum spanning tree was superimposed on the PCoA to show 
which multivariate combinations were most similar. 
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RESULTS 

PCoA and NMDS Solutions 

Analyses based on Bray-Curtis and Euclidean distance measures provided similar rank 
orderings of the lakes for the first two axes regardless of the choice in data standardization or 
whether PCoA or NMDS were used (Table 3.1). All of the rank correlations exceeded 0.90 
for comparisons between the first axes and most exceeded 0.80 for comparisons between the 
second axes. The many combinations of data standardization, distance measure and 
ordination produced similar patterns in the plots of the first two ordination axes (Fig. 3.1). 
However, the actual positioning of points varied among the plots. The most obvious 
difference was that of lake *N' (all lakes were given upper- or lower-case codes to facilitate 
visual comparisons) located on the left-hand side of the plots based on Bray-Curtis and 
Euclidean distance. Although the rank ordering remained consistent, this lake appeared to 
differ from the other lakes in comparisons based on the Bray-Curtis coefficient, particularly 
when used with NMDS. The difference between the Bray-Curtis and Euclidean solutions 
relative to the Cosine results was considerable. Almost all correlations between the cosine- 
based ordinations and those from other distance measures indicated little consistency in the 
rank order of the lakes on the axes (Table 3.1). As well, the scatterplots of the first two axes 
for cosine-based ordinations show little resemblance to those from Bray-Curtis or Euclidean 
measures (Fig. 3.1). These differences were most evident for PCoA and NMDS results using 
ranked data with cosine distances. With this particular combination of choices, the cosine 
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results resembled those found in the first axis using CA and DCA (Table 3.1; Fig. 3.2). 

The PCoA and NMDS ordinations using Bray-Curtis and Euclidean distances showed 
strong rank correlations with the total abundance of taxa (Table 3.1). This result was evident 
irrespective of the choice in data standardization. Cosine-based ordinations did not show high 
correlations with benthic abundance. 

For PCoA solutions, the Bray-Curtis coefficient provided the most consistent results as 
measured by the average m 2 distance (Table 3.2). Both the Euclidean and cosine distances 
exhibited a greater degree of inconsistency than the Bray-Curtis results. Examination of the 
relative positions of the ordinations using a minimum spanning tree (Fig. 3.2) showed that the 
Euclidean and cosine distances are positioned at greater distances from one another than the 
Bray-Curtis solutions. Greater separation in the Procrustean summary and larger average m 2 
distances indicate less consistency in the results from the different analyses. Similar results 
were found for comparisons based on NMDS ordination, except that the Bray-Curtis solutions 
were more consistent than those found in the PCoA-based ordinations (i.e., smaller average 
m 2 Table 3.2, Fig. 3.2). The various Euclidean- and cosine-based results for PCoA were 
slightly more consistent than those from comparable NMDS ordinations. Overall, for all 
coefficients and data standardizations, both PCoA and NMDS ordinations provided similar 
levels of consistency (i.e., average m 2 = 0.524 and 0.528). 
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Table 3.1. Spearman rank correlation coefficients for pair-wise comparisons between the first 
axes from the ordinations (upper triangle) and between the second axes (lower mangle). 
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NMDS-COR - - - 

The ordinations are coded as: principal c<x)rdi nates analysis (PCoA), nonmetnc mullidimeasionaJ 
scaling (NMDS), principal components analysis (PCA), correspondence analysis (CA), and detrended 
correspondence analysis (DCA). Distance measures are coded as a suffix to the ordination and include 
Bray-Curtis (BQ, Euclidean distance (EU), cosine distance (CO), correlation (CR), and covariance 
(CV). Standardizations include data scaled to range between and 1 (S) and ranked data (R). Log- 
transformed data are shown without a code. An example code of PCoA-COR is a principal 
coordinates analysis based on a cosine distance matrix calculated from ranked variables. Only 
coefficients exceeding 0.70 are shown and all values arc multiplied by 100. Size is the total 

geometric abundance of the taxa. 
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Table 3.1 com. Spearman rank correlation coefficients for pair-wise comparisons between the first 

axes from the ordinations (upper triangle) and between the second axes (lower triangle). 
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Table 3.1 cont Spearman rank correlation coefficients for pair-wise comparisons between the first 
axes from the ordinations (upper triangle) and between the second axes (lower triangle). 



PCA-COR PCA-COV CA DCA Size 

S R SR SR S R 

PCoA-EU - - -100 - - _______ 

PCoA-EUS - - --1O0 - 85-83---- 

PCoA-EUR 87 100 100 - 100 _______ 

PCoA-BC ______ _______ 

PCoA-BCS - - -- - - _ 79 ____.. 

PCoA-BCR -79 79 - - 79 _______ 

PCoA-CO - - __ _ _ 76 ______ 

PCoA-COS 87 72 72 - - 72 _______ 

PCoA-COR - - --77 - _ _ 97 - _ - - 

NMDS-EU - ___ _ _ _______ 

NMDS-EUS - - --80 - 77-75---- 

NMDS-EUR 82 95 95 - - 95 _______ 

NMDS-BC - - __ _ _ _______ 

NMDS-BCS go----- _______ 

NMDS-BCR 82 8080- -80 _______ 

NMDS-CO - - - - - _______ 

NMDS-COS 83 - -- _ _ _______ 

NMDS-COR - - __ _ _ __78____ 
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Table 3.1 conL Spearman rank correlation coefficients for pair- wise comparisons between the first 
axes from the ordinations (upper triangle) and between the second axes (lower triangle). 
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Table 3.1 com. Spearman rank correlation coefficients for pair-wise comparisons between the first 
axes from the ordinations (upper triangle) and between the second axes (lower triangle). 
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Table 3.1 conL Spearman rank correlation coefficients for pair-wise comparisons between the first 

axes from the ordinations (upper triangle) and between the second axes (lower triangle). 

PCA-COR PCA-COV CA DCA Size 

SR SR SR SR 

PCA-COR 87 87--- _______ 

PCA-CORS -- -100 _______ 

PCA-CORR - -100 _______ 
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PCA-COVR - - - - - 

CA _____ 

CA-S - - - 

CAR _ _ _ 

DCA _ _ „ 

DCA-S 
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Size 
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Figure 3.1. Scatterplots of the first two axes from each combination of data standardization, 
resemblance measure, and ordination. The ordinations are coded as: principal coordinates 
analysis (PCoA), nonmetric multidimensional scaling (NMDS), principal components analysis 
(PCA), correspondence analysis (CA), and detrended correspondence analysis (DCA). 
Distance measures are coded as a suffix to the ordination and include Bray-Curtis (BC), 
Euclidean distance (EU), cosine distance (CO), correlation (CR), and covariance (CV). 
Standardizations include data scaled to range between and 1 (S) and ranked data (R). Log- 
transformed data are shown without a code. An example code of PCoA-COR is a principal 
coordinates analysis based on a cosine distance matrix calculated from ranked variables. 
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figure 3.2. Axis one and two of a principal coordinates analysis. The distance matrix was calculated using the metric m 2 statistic 
from a standardized Procrustes analysis. The solid connecting line is a minimum spanning tree where nearest neighbours are 
connected. The dashed lines connect the proper names to their respective points. 
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Although many of the PCoA and NMDS ordinations led to similar results, the Bray- 
Curtis coefficient differed depending upon the choice of ordination (Fig. 3.2). As a result, the 
consistency of the various coefficients was similar when averaged across both types of 
ordination. All three measures had similar average m 2 values (i.e., Bray-Curtis = 0.31; 
Euclidean = 0.34; cosine = 0.34). 

Standardizing the data to ranks prior to analysis led to more consistent results 
irrespective of the choice in distance measure or whether PCoA or NMDS was used (Table 
3.3). For PCoA solutions, the use of data standardized to range between 0-1 led to a more 
consistent set of results than leaving data as log-transformed values. However, no difference 
was found for the NMDS results. 

PCA Solutions 

Results based on principal components analysis showed considerable variation 
depending on the data standardization used and, in particular, the choice between the 
correlation and covariance matrix. All solutions showed strong rank correlations between the 
first axis and the measure of benthic abundance (Table 3.1). The first axes from different 
solutions were correlated with one another, as were the second axes. A multivariate 
comparison based on the m 2 measure showed considerable differences between the correlation 
and covariance solutions (Fig. 3.2). The correlation-based results showed the most consistent 
results of any distance measure (average m 2 = 0.106), whereas the covariance results 
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Table 3.2. Average m 2 values for all pair-wise comparisons based on the combination of 

ordination method and distance measure. For example, the value .362 represents the average 

m 1 distance for the three types of standardized data used in a PCoA with the Bray-Curtis 

distance measure. The larger the value observed, the less consistent the results obtained from 

the various ordinations. The overall averages represent the average m 2 value for all 

standardizations and distance measures used with each ordination method. 



Ordination 



PCoA 



NMDS 



PCA 



CA DCA 



Distance BC EU CO BC EU CO CORR COV 

Measure 

.362 .508 .495 .173 .520 .587 .106 .508 



Overall 

Average 



.524 



.528 



.307 



.201 .476 
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demonstrated the least consistency (average m 2 = 0.508). The covariance-based PCA results 

were identical to those found for the PCoA with Euclidean distances, providing that the same 

data standardization was used (Fig. 3.1 and 3.2). From this correspondence between the 

PCoA and PCA solutions, it followed that the correlation coefficient provided a more 

consistent set of ordinations for different data standardizations relative to the linear (i.e., 

PCoA and PCA) and nonmetric (NMDS) ordinations based on other distance measures. In 

combination, the PCA results using correlation and covariance showed greater consistency 

than the PCoA solutions. 

In contrast to the PCoA and NMDS comparisons, the log-transformed data provided 
more consistent results between PCA solutions than data standardized to range between 0-1 or 
the ranked data (i.e., m 2 = 0.381, 0.465, and 0.474 respectively). 

CA and DCA Solutions 

CA, DCA, and the PCoA of cosine distances using 0-1 scaled and ranked data 
produced ordination solutions that were quite similar and contrasted with those from other 
PCA, PCoA, and NMDS results (Table 3.1; Fig. 3.2). Correlations were low between the 
CA-DCA results and those from PCoA or NMDS based on Bray-Curtis or Euclidean distance. 
There were high correlations between the various CA and DCA solutions using the first axis 
(Table 3.1; Fig. 3.1). But none of the CA and DCA axes was highly correlated with overall 
benthic abundance. The CA results demonstrated a much greater degree of consistency than 
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those from DCA (average m 1 = 0.201 and 0.476 respectively; Table 3.2). There were 

reversals in the orientation of some axes (e.g., second axis for the ranked data; Fig. 3.1), 

although this is unimportant as the orientation (i.e., positive versus negative direction) is 

arbitrary. 

Irrespective of the method of ordination used, an overall evaluation of the importance 
of choice of data standardization showed results from ranked data were most consistent (m 2 = 
0.403), followed by data scaled to a 0-1 range (m 2 = 0.454), and the least consistent data 
standardization was the base logarithmic transformation (m 2 = 0.497; Table 3.3). No 
comparisons of distance were made with CA and DCA as both implicidy incorporate the chi- 
squared measure. 
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Table 3.3. Average m 2 distance values for all pair-wise comparisons using that combination 
of ordination method and data standardization. 



Ordination Method 



Standardization 



PCoA 



NMDS 



PCA 



Overall 



log I0 (x+l) 


.565 


Range(0-1) 


.513 


Ranks 


.432 



.510 



.511 



.452 



.381 



.465 



.474 



.497 

.454 
.403 
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DISCUSSION 

Previous studies that compared multivariate methods were based on either simulated 
data (e.g., Swan 1970; Austin I976a,b; Fasham 1977; Minchin 1987) or field data (Oksanen 
1983; Leland et al. 1986; Smith et al. 1988). The former method allows one to specify a 
priori a particular data structure and subsequently evaluate whether the ordination methods 
recover this structure. However, one must be concerned as to whether such simulated data 
adequately represent true field conditions (Austin 1980). The use of field data is based on 
'natural' patterns, but one cannot be certain that results are summarized 'correctly' by any 
one ordination (e.g., Minchin 1987). Obviously, the choice of data standardization, distance 
measure, and ordination technique that best summarize the 'true' patterns is most desirable. 
However, given that we cannot know a priori what these 'true' patterns are in field data, we 
must deduce the effects of such choices by comparing results from a number of analyses. 

Two-dimensional plots are frequently used to examine these types of differences (e.g., 
Kenkel and Orloci 1986; Leland et al. 1986). However, such interpretations may be adversely 
influenced by points near the end of an axis as these points are noticed more readily. 
Methods such as Procrustes analysis permit a more objective measure of the overall similarity 
between the results. The greater the average m 2 distance between ordination results, the 
greater the difference between the ordinations. 
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Based on Procrustes analysis in this study, there are two groups of competing 

multivariate solutions and several other relatively different ordination results. The first group 
includes all CA and DCA ordinations results plus results based on the cosine distance (with 
the exception of the log-transformed cosine results). Although the CA, DCA, and most 
cosine-based solutions represent a strong contrast to the other ordinations, they differ in their 
degree of consistency with respect to data standardization. Only CA demonstrated consistent 
results when different data standardizations were used. Results from DCA were expected to 
show a similar trend, but DCA provided only an intermediate level of consistency. Different 
sources of instability in DCA results have been reported by Oksanen (1988) and Jackson and 
Somers (1991) and such instability could contribute to this inconsistency. However, the 
actual cause for the inconsistency in this study is not readily apparent. 

Solutions based on PCA, PCoA, and NMDS are influenced by both data 
standardization and distance measure. With some of these methods, the choice of data 
standardization appears more influential than the choice of distance measure. For example, 
the Euclidean and cosine distances are strongly influenced by whether the data are rescaled, 
whereas the Bray-Curtis measure is affected by data standardization to a lesser extent. 
Perhaps not surprisingly, the PCoA and NMDS solutions using the same distance measures 
resemble one another greatly. 

Ordinations using log-transformed data tend to show a greater degree of variation than 
analyses using data standardized to range between and 1. Ranked taxonomic abundance 
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results provide the most consistent form of standardization. This manner of standardization 

will linearize relationships between taxa exhibiting monotonia patterns in abundance (i.e., 

pairs of taxa having steadily increasing or decreasing abundances across sites). Results from 

PCoA and NMDS solutions show the greatest consistency with ranked data. Paradoxically 

PCA shows the lowest consistency with ranked data. Results from the correlation-based PCA 

are quite consistent irrespective of the type of data standardization. This consistency may be 

attributable, in part, to the fact that solutions based on data scaled to range between and 1 

and rank-abundance data are identical. The resulting m 2 distance between these two solutions 

is 0, thereby lowering the average /n 2 considerably. 

The interaction between data standardization and distance measure appears to be the 
primary factor influencing the resulting patterns in PCA, PCoA, and NMDS analyses. For 
any given standardization, solutions for these three ordinations diverge when different 
distance measures are used. Although the effect of data standardization is moderated by the 
choice of distance measure, Euclidean distance, the covariance, and the cosine distance are 
most strongly affected. In contrast, relative measures like the correlation coefficient and 
Bray-Curtis distance show less influence due to data standardization. As a result, the choice 
of distance appears to be of greater significance than the type of ordination when using 
PCoA, NMDS, or PCA. This result is not unexpected given that each distance measure 
incorporates different implicit standardizations (e.g., compare formulae for the correlation and 
Bray-Curtis coefficients). 
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The use of 0-1 scaling or ranking, led to greater consistency among most ordinations. 

Consequently, special attention to the choice of data standardization is required. Justification 
for such choices (or lack thereof) is important. CA and PCA based on a correlation matrix 
were least affected by data standardization although the two approaches provided very 
different solutions. Correspondence analysis, DCA, and ordinations using the cosine distance 
and 0- 1 scaled or ranked data provided solutions more similar to one another than those based 
on other choices with PCoA, NMDS, and PCA. NMDS results were quite similar to PCoA 
solutions and many of the CA, DCA, and PCA results depending upon which distance 
measure was selected (c.f., Kenkel and Orloci 1986). Hence, on the basis of these results, 
there is no advantage conferred by using the more complicated NMDS instead of its metric 
counterpart. This finding supports the recommendation by Digby and Kempton (1987) that 
researchers employ metric rather than nonmetric multidimensional scaling techniques. 

The choice of distance measure and ordination leads to an implicit choice about 
whether the significance of abundance and the corresponding 'size' axis in community data is 
informative (Boecklen and Price 1989; Jackson et al. 1989). If absolute abundance is 
important, then ordinations using Euclidean distance or a covariance matrix will emphasize 
this attribute of the data (but also note the increased divergence in such ordinations when 
differing data standardizations are employed - see Fig. 3.2). Standardized coefficients, such 
as the correlation or Bray-Curtis coefficient, are also influenced by the abundance of taxa, but 
to a lesser extent. Should one wish to remove the community-size gradient associated with 
total abundance, CA will summarize variation in relative abundance with little influence of 
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data standardization. Alternatively, the 'size' or abundance axis of the absolute-abundance 

ordinations can be ignored and patterns of relative abundance in subsequent axes can be 

examined. However, correspondence analysis combines the attributes of being free of the 

effect of total abundance while providing highly consistent results regardless of data 

standardization. Although DCA and some cosine-based ordinations provided similar solutions 

to CA, these results varied considerably according to the choice of data standardization. 
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Chapter 4: STOPPING RULES DM PRINCIPAL COMPONENTS ANALYSIS: A 
COMPARISON OF HEURISTICAL AND STATISTICAL APPROACHES 



ABSTRACT 

Approaches to determining the number of eigenvalues or components to interpret from 
principal component analysis were compared. Heuristic procedures included: retaining 
components with eigenvalues greater than one (i.e., Kaiser-Guttman criterion); components 
with bootstrapped eigenvalues greater than one (bootstrapped Kaiser-Guttman); the scree plot; 
the broken-stick model, and components with eigenvalues totalling to a fixed amount of the 
total variance. Statistical approaches included: Bartlett's test of sphericity; Bartlett's test of 
homogeneity of the correlation matrix, Lawley's test of the second eigenvalue; bootstrapped 
confidence limits on successive eigenvalues (i.e., significantly different between eigenvalues); 
and bootstrapped confidence limits on eigenvector coefficients (i.e., coefficients that differ 
significantly from zero). All methods were compared using simulated data matrices of 
uniform correlation structure, patterned matrices of varying correlation structure and data sets 
of lake morphometry, water chemistry, and benthic invertebrate abundance. The most 
consistent results were obtained from the broken-stick model and a combined measure using 
bootstrapped eigenvalues and associated eigenvector coefficients. The traditional and 
bootstrapped Kaiser-Guttman approaches overestimated the number of non-trivial dimensions 
as did the fixed-amount-of-variance model. The scree plot consistently estimated one 
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dimension more than the number of simulated dimensions. Bartlett's test of sphericity 

showed inconsistent results. Both Bartlett's test of homogeneity of the correlation matrix and 
Lawley's test are limited to testing for only one and two dimensions respectively. 



INTRODUCTION 

The vast number of species and sites considered in community ecology has led most 
ecologists to adopt multivariate methods as an essential part of their analyses (Legendre and 
Legendre 1983; Digby and Kempton 1987). Because many multivariate methods summarize 
large, multidimensional data sets into two or three dimensions, ecologists can examine the 
dominant trends of inter-specific and inter-site covariation. Without multivariate methods, 
studies of plant and animal communities would be limited by the overwhelming amount of 
data and large number of pairwise species or site combinations. Among the potential 
multivariate methods, ordination is used frequently to: (1) maximize the amount of variation 
in species data summarized in a minimal number of dimensions for graphical presentation; (2) 
summarize correlated variation prior to cluster analysis; (3) remove multi-collinearity between 
environmental variables prior to multiple regression; and (4) directly examine relationships 
between species and environmental data. 

Although ecologists have compared the results of various ordination methods (see 
Gauch 1982a; Pielou 1984; Digby and Kempton 1987; Minchin 1987 for comparisons), few 
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guidelines exist to evaluate how many ordination axes should be considered non-trivial and 

interpretable. An implicit assumption in the use of ordination methods is that the experienced 
ecologist can separate meaningful patterns from random noise (i.e., ecologically meaningful 
information versus sampling variation or measurement error, Gauch 1982b). An ability to 
distinguish 'signal' from 'noise' is essential and in a statistical sense, these decisions provide 
'stopping rules'. The failure to distinguish between signal and noise may lead to the rejection 
of useful information or the interpretation of ecologically meaningless information. In the 
former case, a loss of information may limit our understanding of ecological processes. In 
the latter case erroneous conclusions may result. This study examines the issue of assessing 
multivariate data dimensionality using both heuristical and statistical approaches. In the 
process of comparing various approaches to determining non-trivial components, a measure is 
proposed where a combined estimate from confidence limits associated with the eigenvalues 
and the eigenvector coefficients from a bootstrapped principal components analysis. These 
confidence limits provide an heuristic guide for determining both the underlying 
dimensionality of the data and how many components should be retained for subsequent 
interpretation. This study is restricted to principal components analysis (PCA) as it represents 
one of the simplest and most commonly used multivariate ordination methods. In many 
instances, these results can be extrapolated to related multivariate techniques. 

A parallel analysis of field and simulated data is used to examine the implications of 
different: (i) degrees or strength of inter-variable correlations; (ii) numbers of variables; and 
(iii) structure within correlation matrices (i.e., blocks of correlated variables which are 
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uncorrected with other variables). These three conditions are important factors in 

determining the success of a multivariate analysis. The types of data ecologists analyze often 
lead to different degrees of correlations. For example, variables measuring the morphology of 
organisms or lakes often show strongly correlated variables whereas correlations of the 
abundance of organisms may be weaker. Studies often differ in their ratio of the number of 
observations relative to variables in the analysis. Although some researchers recognize the . 
importance of having a ratio of 3:1 or greater (see Grossman et al. 1991), it is not uncommon 
to find studies having more variables than observations. Often the implication for this latter 
situation remains unrecognized. Within data sets, there may be groups of variables (e.g., 
species) which are highly correlated with one another, but uncorrelated with other groups. 
The within- and between-group structure also contributes to substantial effects within 
principal components analysis. This study examines methods of determining how many 
components should be considered as non-trivial when the characteristics of the data matrices 
are varied. The use of simulated data permits the various methods to be evaluated with data 
of predetermined dimensionality. 
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METHODS 



Data Matrices • Ecological 



Three ecological data sets were used for comparisons with the simulated data. The 
first data set was based on four lake morphological variables of 40 lakes from south-central 
Ontario. These variables were strongly associated with one another and generally had 
correlations between 0.6 and 0.9. The second matrix included measurements on 12 chemical 
elements or compounds from the lakes. Correlations for this matrix ranged from near to 
0.7. The third matrix comprised abundance measurements on 32 benthic invertebrate taxa 
from the same 40 lakes. The correlations between these taxa varied between and 0.7 with 
most correlations between 0.3 to 0.4. All data in these ecological data sets were transformed 
to linearize inter-variable relationships and approximate normal distributions (details in 
Chapter 5 and Jackson and Harvey 1992). 

Simulated • Uniform Correlation 

Normally distributed data were simulated to match the number of variables in three 
ecological data sets. Data matrices were constructed with 4, 12, and 32 variables and 1000 
observations. Matrices were simulated having three levels of overall correlation structure. 
For each population, the correlations were uniformly generated to be 0, 0.3, or 0.8 for all off- 
diagonal correlations (i.e., R0, R3 and R8). Simulations were conducted until all correlations 
were within +/- 0.05 of these target values. This approach generated matrices having no 
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interpretable dimensions (i.e., RO), a weak one-dimensional structure (R3), and a strong one- 
dimensional structure (R8). From each population, three replicates of 40 observations each 
were sampled. 



Structured Correlation 

For comparison, matrices were simulated with three-dimensional structure. These 3-d 
matrices also varied as to whether the correlation between a correlated submatrix was 
uncorrected or weakly correlated with the others. In the fust set of simulations, 12-variable 
matrices were divided into groups of four variables each. Within-group correlations were 
equal to either 0.3 or 0.8 whereas the between-group correlations were equal to either or 0.3 
(see Fig. 4. 1 for correlation matrices from simulations S-/ to S-/V). This approach was used 
to simulate S-I to 5-/// in order to examine the effect of differing degrees of strength in 
correlation structure. To study the effect of groups having different numbers of constituent 
variables, the matrices for S-/V used groups containing 5,4, and 3 variables respectively. 
Within-group correlations were set to 0.8 and between-group correlations. to 0. From each 
population, three replicates of 40 observations each were sampled. 

The simulation resulted in a 3 x 3 x 3 design based on the number of variables, level 
of correlation, and replication for the uniform matrices and 4 matrices x 3 replicates for the 
variable-correlation matrices. Matrix names are coded such that the first number indicates the 
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number of variables and the subsequent alphanumeric indicates the degree of correlation 

within the matrix. For example, 12R3 is a 12-variable matrix with all inter-variable 
correlations equal to 0.3. 



Statistical Analysis 

Principal components analyses were conducted using the PRINCOMP procedure in 
SAS (SAS Institute 1989) and programs originating from Orloci (1978). Both methods 
provided the same results. All eigenvalues and eigenvector coefficients from each analysis 
were saved and univariate statistics summarized using the UNIVARIATE procedure (SAS 
Institute 1989). The approaches used to evaluate component retention and interpretation are 
detailed below. 

Methods of Component Assessment • Heuristical Approaches 

A. Kaiser -Guttman 

The most common stopping rule in principal components analysis is based on the 
average value of the eigenvalues (i.e., the Kaiser-Guttman criterion; Guttman 1954; Kaiser 
1959; Cliff 1988). Because variables are often measured in different units, most ecologists 
use a correlation matrix in PCA thereby giving each variable equal weight in the analysis. As 
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a result, the sum of the eigenvalues equals the number of variables. In the Kaiser-Guttman 

method, eigenvalues greater than the average eigenvalue (i.e., X > 1.0) are retained because 
these axes summarize more information than any single original variable. Therefore, only 
components with X > 1.0 are interpreted. Unfortunately, a PCA of randomly generated, 
uncorrected data will produce eigenvalues exceeding one. As a result, this method has been 
criticized (e.g., Karr and Martin 1981; Rexstad et al. 1986,1988; Stauffer et al. 1985; 
Grossman et al. 1991); however the Kaiser-Guttman criterion remains the most popular 
stopping rule in ecology. 

B. Bootstrapped Kaiser-Guttman 

The bootstrap resampling technique (Efron 1979 and see below) was proposed as a 
means of determining the interpretability of eigenvalues by Lambert et al. (1990). They 
argued that the Kaiser-Guttman criterion was arbitrary and it ignored error associated with 
each X due to sampling. Consequently, eigenvalues of 0.99 would be discarded, whereas an 
eigenvalue of 1.01 would be retained even though an eigenvalue of 1.01 may have 95% 
confidence limits ranging from 0.9 to 1.1. As a result, they proposed that the bootstrap 
should be used to determine how many eigenvalues had confidence limits exceeding the 1.0 
criterion (i.e., a bootstrapped Kaiser-Guttman approach). 
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Figure 4.1. Correlation structure for the patterned matrices used in 57 to S-IV. The values 
presented are the off -diagonal inter-variable correlations. For example in S-L variables 1-4 
were correlated with one another at r = 0.8, as were variables 5-8, and 9-12. However, 
correlations between variables from different submatrices were equal to 0. 
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C. Scree Plot 

Another common method (although used infrequently by ecologists; e.g., Zebra and 
Collins [1992]) is that of the scree plot The method is named for the rubble at the base of a 
cliff where a change in the angle of repose occurs. To apply the scree method, one plots the 
value of each successive eigenvalue against the rank order (see e.g.. Fig. 4.2). The smaller 
eigenvalues, representing random variation (i.e., the rubble), tend to lie along a straight line. 
The point where the first few eigenvalues depart from the line distinguishes the 'interpretable' 
and trivial components. Cattell (1966) originally proposed that points to the left of the 
straight-line segment should be considered important (i.e., three components in the structured 
data of Fig, 4.2), but subsequently concluded (Cattell and Vogelmann 1977) that the first 
eigenvalue to the right of this point should also be included (i.e., four interpretable 
components in Fig. 4.2). Often the scree approach is complicated by the lack of any obvious 
break or the possibility of multiple break points. 

Horn (1965; Hom and Engstrom 1979) recognized that with matrices comprised of 
random data, the scree plot would always show a negative slope. Hom argued that 
distinguishing eigenvalues in scree plots remained quite arbitrary. As a result, he proposed a 
modification to the scree plot (Hom 1965). After analyzing a given data set and plotting the 
eigenvalues in a traditional scree plot, numerous matrices of rank equal to the observed data, 
but with uncorrected variables, are generated and eigenvalues are calculated. These 
eigenvalues from the random data are tabulated and the mean values plotted on the scree plot 
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of the original data. The point were the two lines cross indicates the maximum limit where 

eigenvalues are considered interpretable. Further variations of this method have included 
regression or Monte Carlo approaches (e.g., Allen and Hubbard 1986; Lautenschlager 1989). 

D. Broken-Stick 

Frontier (1976) proposed a broken-stick method that is based on eigenvalues from 
random data. Frontier's model assumes that if the total variance (i.e., sum of the eigenvalues) 
is divided randomly amongst the various components, then the expected distribution of the 
eigenvalues will follow a broken-stick distribution (i.e., the random data in Fig. 4.2). 
Observed eigenvalues are considered interpretable if they exceed eigenvalues generated by the 
broken-stick model. Frontier (1976) and Legendre and Legendre (1983) provide a table of 
eigenvalues based on the broken-stick distribution, but the solution is easily calculated as 



*>* - 






where p is the number of variables and b k is the size of the eigenvalue for the kth component 
under the broken-stick model. 

E. Proportion of Total Variance 

Another simple criterion for estimating the number of non-trivial components is to 
include all components up to some arbitrary proportion of the total variance. This method 
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Figure 4.2. Eigenvalues from a principal components analysis of a 12-variable data set of 
randomly generated, uncorrected data and for a data set with underlying structure. 



I 

I 



100 

typically includes components comprising 95% of the total variance. Although this method is 

advocated by some statisticians (Joliffe 1986), J.E. Jackson (1991) strongly recommended 
against its application as being unfounded and unreliable. 

Statistical Approaches 

Some data analysts retain components with significant correlations (i.e., P < 0.05) 
between the component scores and the original variables. Statistically this approach is flawed 
because the PCA solution and original variables are not independent, and as a result, the 
attributed significance is inappropriate. In addition, components with only a single 
'significant' correlation suggest that the axis is not a satisfactory multivariate summary. 

F. Test of Sphericity 

Bartlett's test of sphericity (Cooley and Lohnes 1971; Pimentel 1979) evaluates 
whether each sequential eigenvalue is significantly different from the remaining eigenvalues. 
Conceptually, the test attempts to reveal the point where the PCA summarizes a spherical 
distribution of points. The test statistic is calculated as 

(p-k) ln[ £ X, I (p-k)] - £ X 1 

where p is the number of variables, k represents a specific component, X, is the eigenvalue of 
component i, and n is the number of observations. If the resultant statistic is multiplied by n- 
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k, the product is jf distributed with 05(p-k-l)(p-k+2) degrees of freedom. This test was 

originally developed for a covariance matrix and several studies recommend its use only in 
covariance- based analyses (Dillon and Goldstein 1984; Morrison 1990; Grossman et al. 1991; 
J.E. Jackson 1991). However, the test can be used with a correlation matrix where such 
results are considered to be conservative estimates of the number of non-trivial components 
(Pimentel 1979; Kendall 1980). 

G. Bartlett's Test of the Equality of\ } 

Bartlett (1954) also developed a statistical test of whether the first eigenvalue of a 
correlation matrix is equal to the remaining set of eigenvalues (i.e., correlation matrix 
homogeneity). Bartlett's test is based on the log-transformed eigenvalues and calculated as 



X 2 = -[n-_l(2p+5)]£lna p ) 



where n equals the number of observations, p is the number of variables, X p is the eigenvalue 
for the pth components, and the test has p(p-l)l2 degrees of freedom (J.E. Jackson 1991). 
The test is limited because it only examines the first eigenvalue. However it provides an 
assessment of the overall PCA (i.e., if the null hypothesis is not rejected, it is pointless to 
interpret the PCA). 
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H. Lawley's Test of X^ 

Lawley (1956,1963) proposed a method to test for the equality of the p-1 eigenvalues 
(i.e., all but the first eigenvalue). It is based on the following 



x 2 - -^EE nv*-*) 1 - * E ^-r) 2 ] 



where r- is the correlation between variable i and variable j and 



u 



= p/p-D 51 E r ij 



X = l-T 



p = (p-l) 2 (l-ft 2 ) 

p-(p-2)X' 



F 



r * = -^T E r i* k=l,...p 



with (p+l)(p-2)/2 degrees of freedom. One limitation of this approach is that the test only 
evaluates the second eigenvalue. Subsequent eigenvalues are not compared when the null 
hypothesis is rejected. This approach has been applied in a recent study of principal 
components analysis with ecological data (Grossman et al. 1991), 



103 

/. Bootstrap Eigenvalue-Eigenvector 

In traditional statistical approaches, data sets are summarized with a variety of 
parameters (e.g., the mean). Characteristics of the error associated with a given parameter are 
estimated using formulae based on a number of assumptions (e.g., normality). In the 
bootstrap approach, the estimate of sampling error is derived empirically. Replicated samples 
of the data set are obtained by randomly sampling with replacement from the original data 
set If the original data set had 40 observations, then the replicated data sets also have 40 
observations, but some observations may occur more than once and some may be omitted. 
The statistic of interest (i.e., the mean) is calculated for the bootstrap sample. Another data 
set is sampled and the statistic calculated. This is repeated a large number of times. The 
distribution of the values from these analyses provides an estimate of the variability 
associated with the mean. Standard errors, confidence limits, or other estimates of variability 
can be calculated using this approach. The bootstrap has many advantages over traditional 
analyses because of fewer assumptions. In addition, the sampling properties of some 
parameters can be estimated even though algebraic solutions have not been or cannot be 
determined. 

In this study, each principal components analysis was bootstrapped 100 times (Note 
that a greater number will provide a more reliable estimate of confidence limits and 500 or 
1000 times is often recommended). Eigenvalues from each bootstrap sample and the 
associated eigenvector coefficients were retained. Means, minima, maxima and 95% 



confidence limits were calculated from the distribution of the eigenvalues. Where the 
confidence intervals overlapped between pairs of successive eigenvalues, these eigenvalues 
were considered to be indistinguishable from one another. However, if the ranges did not 
overlap, the eigenvalues were assumed to be different. This latter condition was considered 
to represent the break-point between 'meaningful' or non-trivial components and those 
associated with sampling and random noise. 

Similarly, the eigenvector coefficients were evaluated using a bootstrap approach. 
Coefficients which did not differ significanUy from zero were categorized as trivial or non- 
significant However, if zero fell outside the 95% confidence limits, then the coefficient was 
considered to be relatively stable and informative. Only bootstrapped components having two 
or more coefficients different from zero were considered as meaningful. Components with 
only a single non-zero coefficient represented only a single variable, hence the component 
does not provide a true multivariate summary. 

Several additional methods have been proposed including the ratio of X. t / X k+1 (Hirsh 
et al. 1987); the number or percentage of residuals outside univariate limits (Howery and 
Soroka 1987); and cross validation (Wold 1976, 1978; Eastment and Krzanowski 1982). 
However, none of these methods in used commonly, particularly by ecologists. 

As a means of evaluating the overall similarity among the different approaches with 
the different data sets, a multivariate summary was done. The numbers of non-trivial 
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eigenvalues from each data set for each method (i.e.. Tables 4.2-4,4) was tabulated. Results 

from the 4-variable analyses were not included as some of the eigenvalues could not be 
interpreted (see Results). Results from Bartlett's test of X, and Lawley's test of X 2 were not 
included as they arc limited to evaluating only one or two eigenvalues respectively. Matrix 
observations were based on the methods of eigenvalue assessment for each of the replicated 
data sets (i.e., 8 methods by 3 replicates). In addition, the number of dimensions that were 
simulated for each data set was included as an additional observation. A distance matrix was 
calculated between observations using the Manhattan distance measure. This measure was 
chosen to emphasize the absolute rather than relative distance between points. A cluster 
analysis (UPGMA; Rohlf 1989) was done on the distance matrix to determine the relative 
similarity amongst the different measures. 

RESULTS 

A. Kaiser-Guttman Approach (X > 1.0) 

For matrices with correlations of R0 or R3, the Kaiser-Guttman method retained about 
50% of the components for the 4- and 12-variable matrices and 30-40% of the components in 
the 32-variable matrices (Tables 4.1, 4.2 and 4.3). With R8 matrices, only one eigenvalue 
exceeded 1.0, indicating a single interpretable gradient. The Kaiser-Guttman method of 
interpreting Xs > 1.0 often indicated PCAs contained 3 interpretable components (i.e., within 
each analysis there were 3 eigenvalues exceeding 1.0). For the 5-// matrices the approach 
indicated retaining 5 components (Table 4,4) although only 3 dimensions were constructed in 
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the simulations. 

B. Bootstrapped Kaiser-Guttman 

The bootstrap of the eigenvalues using the Kaiser-Guttman approach resulted in only 
one component being considered non-trivial with each of the 4-variable matrices (i.e., 4R0, 
4R3, and 4R8). Four components were retained with 12R0 PCAs and 2-3 components for the 
12R3 matrices. The 32R0 matrices had 9-10 components retained and 8 components for the 
32R3 matrices. For all R8 matrices (i.e., 4R8, 12R8, and 32R8), the method correctly 
indicated that only the first component was non-trivial. In the structured matrices, 3-4 
components were identified as non-trivial in the low-correlation S-II matrices, whereas only 2 
components were retained from 5-///, and 3 components from the other analyses (i.e., 5-/ and 

s-w. 

Both versions of the Kaiser-Guttman approach indicated a single interpretable 
dimension with the 4-variable matrix of lake morphometry. The PCA based on the 12- 
variable matrix of water chemistry revealed 3 non-trivial components with X > 1.0. 
However, the bootstrapped evaluation suggested that only the first eigenvalue was 
significantly greater than 1 .0. Both the traditional and bootstrapped approaches indicated that 
9 and 8 components respectively were non-trivial in the 32-variabIe matrix of benthic 
invertebrates. 
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Table 4.1. Number of non-trivial components indicated by various methods. Data matrices 
having 4 variables and 40 observations with uniform correlations. 



4R0 



4R3 



4R8 



Morphology 



A B 



B 



Kaiser -GuUman 



1 11111 



1111 



Bootstrap 111 1 11111 
Kaiser-Guttman 

Scree Plot - 2 - 2 2 2 2 

Broken Stick 

95% Variance 444 4 44 3 23 

Sphericity Test 1 11 1 11 

Bartlett's 1 11 1 II 
First Eigenvalue 



Lawley's 



<2 <2 <2 <2 <2 <2 <2 <2 <2 



Bootstrap 1 11 1 11 

Eigenvalue 

Bootstrap 1 11 1 11 

Eigenvector 



1 
3 

3 
1 

24 

2 
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Table 4.2. Number of non-trivial components indicated by various methods. Data matrices having 

12 variables and 40 observations with uniform correlations. 



12R0 



12R3 



12R8 



B C 



B C 



Chemistry 



Kaiser-Guttman 

Bootstrap 
Kaiser-Guttman 

Scree Plot 

Broken Stick 



4 4 4 



1 1 1 



3 2 111 



1 111 11 



95% Variance 11 10 11 10 11 10 6 7 7 

Sphericity Test 1 111 11 

Bartlett's 10 1 111 11 
First Eigenvalue 



Lawley's 

Bootstrap 
Eigenvalue 

Bootstrap 
Eigenvector 



< 2 < 2 < 2 < 2 < 2 2+ 2+ 2+ 2+ 



1 111 11 



1 111 11 



? 
1 

4 
3 
8 
3 

1 

2+ 
1 
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Table 4.3. Number of non-trivial components indicated by various methods. Data matrices having 

32 variables and 40 observations with uniform correlations. 



Scree Plot 



Broken Stick 



32R0 



32R3 



32R8 



Benthic 
Invertebrates 



ABCABCABC 



Kaiser-Guttman 12 14 13 10 10 11 1 11 

Bootstrap 10 10 9 8 8 8 1 11 

Kaiser-Guttman 



15 5 



Lawley's 

Bootstrap 
Eigenvalue 

Bootstrap 
Eigenvector 



2 2 



111111 



95% Variance 22 22 22 21 21 21 14 14 14 

Sphericity Test 1 1 11 1 2 6 

BarUetfs 111111111 
First Eigenvalue 



< 2 < 2 < 2 2+ 2+ 2+ 2+ 2+2+ 



111111 



111111 



2 
19 
14 
! 

2+ 

1 
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Table 4.4. Number of non-trivial components indicated by various methods. Patterned data 
matrices having 12 variables. 

s-i s-h s-m s-rv 

abcabcabcabc 

Kaiser-Gunman 333555333333 

Bootstrap 33 343 3 222 333 

Kaiser-Gunman 

Scree Plot 44434 2444444 

Broken Stick 3331 1 1 322333 

95% Variance 7 7 810 10 107 7 8 878 

Sphericity Test 333222333334 

Bartlett's 111111111111 

First Eigenvalue 

Lawley's 2+ 2+ 2+ 2+ 2+ 2+ 2+ 2+ 2+ 2+ 2+ 2+ 

Bootstrap 333000331332 

Eigenvalue 

Bootstrap 33302 1033333 

Eigenvector 
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C. Scree Plot 

Results from the scree plot based on 4-variable matrices were difficult to interpret In 
some cases, trends were apparent, but in other cases it is difficult to discern any pattern in the 
plot because only 4 points were available. Where a trend was apparent, the approach 
advocated by Cattell and Vogelmann (1977) suggested that 2 components were non-trivial. 
With the 12R0 analyses, the scree indicated 4-7 components should be interpretable. The 
number of components dropped to 2-3 when 12R3 matrices were used and 2 components 
would be retained with I2R8 matrices (see Fig. 4.3). In the 32R0 analyses, the scree plot 
results suggested from 5 to 15 components should be interpreted, 2-6 components with 32R3, 
and 2 components with 32R8 analyses. 

For the structured matrices, the scree plot suggested that there were 4 non-trivial 
components in the S-I, S-IH, and S-IV matrices. For PCAs based on S-II, scree results 
indicated that between 2-4 components would be considered interpretable. As with the 
simulated 4-variable matrices, no estimate of the number of dimensions for the lake 
morphometry data could be made because no obvious trend was apparent. With the water 
chemistry data, there were three interpretable components, but a second break is evident (Fig. 
4.3). If this latter point was considered, then a total of 5 components were non-trivial. 
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Figure 4.3. Eigenvalues from principal components analyses from data sets which contain 
uncorrelated, weakly correlated, or strongly correlated variables. Each matrix comprises 12 
variables and 40 observations. Eigenvalues from a PCA of 12 lake water- chemistry variables 
are plotted. 
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D. Broken-Stick Model 

The broken-stick method correctly identified the dimensionality of all uniform- 
correlation matrices. For the RO matrices, the method indicated that the underlying 
dimensionality was 0, and 1 component as non-trivial with R3 or R8 matrices. This method 
revealed 3 interpretable components for matrices from S-I and S-/V, a single component from 
5-//, and 2-3 components from S-III. A single component would be retained from the lake 
morphometry data, three from the water chemistry data, and two components from the benthic 
invertebrate data. 

E. 95% of the Total Variance 

The approach of retaining components until 95% of the total variance was achieved 
would result in all components being interpreted for the 4R0 or 4R3 analyses, and 2-3 
components for the 4R8 analyses. Application of the method with the 12-variable PCAs led 
to 10-11 eigenvalues being considered important for 12R0 and 12R3 matrices, and 6-7 
eigenvalues for the 12R8 matrices. A total of 22 components would be retained from the 
32R0 analyses, 21 from the 32R3, and 14 components from the 32R8 PCAs. The fixed- 
variance approach also led to more components being retained with the 5-/ to S-IV matrices 
than any other method. In all cases, the method would lead to 7-11 components being 
considered important and interpreted. This approach led to 3 components being considered 
non-trivial from the lake morphometry data and 8 from the water chemistry data. The 
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procedure retained three quarters of the components from each PCA. Nineteen of a possible 

32 components would be kept from the benthic invertebrate PCA. 

F. Bartlett's Test of Sphericity 

Bartlett's test correctly identified the dimensionality of the PCAs based on 4 variables 
(i.e., for RO, 1 for R3 and R8). The method correcdy estimated the dimensionality of the 
12- variable matrices, but was more erratic with the 32-variable matrices. The method 
correcdy indicated no significant eigenvalues with 32R0. However, one of the 32R3 matrices 
led to 11 eigenvalues being identified as significant and two PCAs from 32R8 matrices led to 
2 and 6 components being retained. The test indicated 3 significant components for 5-/ and 
S-IH matrices, 2 components for the S~II analyses, and 3-4 for those from S-TV although all 
these matrices were constructed with three-dimensional structure. Bartlett's test of sphericity 
yielded 3 significant eigenvalues from the 4-variable lake morphometry data and the 12- 
variable water chemistry, plus 19 significant eigenvalues from the PCA of lake benthic 
invertebrate data. 

G. Bartlett's Test of the Equality of X, 

Bartlett's method testing whether the first X from a correlation matrix is equal to all 
others led to correct identification of the minimum number of interpretable dimensions in all 
4R data sets. With the 12- and 32-variable data sets, the test erroneously indicated X, in one 
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12R0 PCA and all of the 32R0 PCAs as being significantly different from the remaining 

eigenvalues. With the patterned matrices (i.e., S-I to S-fV), the test indicated at least one 
significant eigenvalue in each PCA. Bartlett's test indicated that there was at least one 
significant component in each of the lake morphometry, chemistry, and invertebrate PCAs. 

H. Lawley's Test of ^ 

Lawley's test for correlation -based PCAs correctly identified a maximum of one 
underlying dimension with the 4-variable solutions. With I2R0 matrices, the test indicated 
fewer than 2 significant Xs for the PCAs. However, with 12R3 matrices, one result suggested 
a minimum of 2 or more significant eigenvalues, and all 12R8 matrices had 2 or more 
significant eigenvalues. Results based on 32R0 matrices led to fewer than 2 eigenvalues 
being considered interpretable, whereas all 32R3 and 32R8 analyses indicated 2 or more 
significant Xs. Lawley's method indicated that each PCA from 5-/ to S-fV contained a 
minimum of 2 significantly different eigenvalues. Although neither Lawley's nor Bartlett's 
test is capable of accurately assessing the true dimensionality of these matrices (i.e., 3 
dimensions), they can establish that a minimal number of components is interpretable (i.e., 
either 1 or 2 components respectively). Lawley's test indicates that a minimum of two 
components should be considered interpretable in each of the field data sets. 
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/. Bootstrapped Eigenvalue-Eigenvector 

When the bootstrap approach was used to determine overlap between eigenvalues, the 
number of components retained was lower. For the 4R, 12R, and 32R matrices, the method 
correctly identified that no components were interpretable with RO and only one component 
for PC As based on R3 and R8 matrices. Identical results occurred with the estimations based 
on the bootstrapped eigenvector coefficients. When the bootstrap was used to distinguish 
i>»ru/*»pn nvM-lanninp eigenvalues with the Datterned matrices, results suggested that 3 
components be retained for S-I, no components from S-II, 1-3 components from S-Ill, and 2-3 
components from S-/V. Bootstrapped eigenvector coefficients indicated 3 interpretable 
components for S-I and S-/V, 0-2 components for S-II, and 0-3 components for S-HI. These 
approaches suggested 2 and 3 dimensions respectively for the PCA of lake morphometric 
data. For the water chemistry matrix, the bootstrapped eigenvector coefficients indicated that 
two components were interpretable although the bootstrapped-eigenvalue version indicated 
only a single non-trivial component With the benthic invertebrate data, only a single 
component would be retained based on either approach. 

The multivariate summary using cluster analysis provided an overall assessment of the 
similarity amongst the methods for each replicated data set. The broken stick (STICK), 
bootstrapped eigenvalues (BEIGVAL), and bootstrapped eigenvector coefficients (BEIGVEC) 
all clustered together (Fig, 4.4). The joint clustering of this group of methods indicated the 
high degree of similarity of their results. These methods also provided the best assessment of 
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the true dimensionality of the simulated data sets (i.e., the "known" in Fig. 4.4). Bartlett's 

test of sphericity, followed by the scree plot, provided the next most similar results relative to 
the true dimensionality (as defined, caption, Fig. 4.4). The Kaiser-Guttman, bootstrapped 
Kaiser-Guttman, and the 95%-variance methods all showed the poorest results in assessing the 
underlying dimensionality of the simulated data. 
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Figure 4.4. Cluster analysis showing the similarity between methods of eigenvalue assessment 
for each replicated data set Data in the cluster analysis are based on the number of non- 
trivial components resulting from each method with each data set. Codes are: KG-1 is 
Kaiser-Guttman for the First replicate from each data set; BKG is the bootstrapped Kaiser- 
Guttman; SCREE is the scree plot; STICK is the broken-stick model; 959c is the 95% of the 
total variance criterion; SPHERE is Bartlett's test of sphericity; BEJGVAL is the bootstrapped 
eigenvalue method; BEIGVEC is the bootstrapped eigenvector coefficient method; and known 
is the true number of dimensions incorporated into the data sets during simulation. 
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DISCUSSION 

The standard Kaiser-Guttman approach of interpreting X > 1.0 led to the retention of 
too many components except with matrices having strong correlation structure. Although 
such correlation structures are often found in morphometric analyses, ecologists often analyze 
matrices of limited structure (i.e., weakly correlated data). In such cases, adoption of the 
Kaiser-Guttman method results in the retention of too many components. Joliffe (1972, 1986) 
suggested that the choice of X > 1 .0 was too conservative and that components with X > 0.7 
should be considered useful. Clearly, this level of selection would only exacerbate a bad 
situation. Given that the Kaiser-Guttman approach is frequenUy used by ecologists, it would 
be wise to use an alternative method to choose non-trivial components. The use of 
eigenvalues in a modified Kaiser-Guttman approach does not change the results substantially. 
The bootstrapped Kaiser-Guttman did reduce the number of non-trivial components, but the 
number still exceeded the number of underlying dimensions except for the R8 matrices. 

The scree plot (as applied as Cattell and Vogelmann 1977) provided poor resolution of 
the underlying dimensionality. The method invariably overestimated the number of 
interpretable components. If Cattail's (1966) original criterion was used, the method was 
more conservative. This means including the eigenvalues up to, but not including, the first 
eigenvalue on the straight-line portion of the plot (see Fig. 4.2). However this modification 
still over-estimates the number of interpretable components in analyses of matrices of weakly 
correlated data. Surprisingly, the original approach (Cattell 1966) provided a better estimate 
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of the correct number of dimensions with the simulated data matrices. 

The broken-stick method correctly assessed the dimensionality of the data matrices. It 
did under-estimate the number of interpretable components for simulation II matrices. This 
method provided a good combination of simplicity of calculation and accurate evaluation of 
dimensionality relative to the other statistical approaches. 

The 95%-variance-threshold method provided unsatisfactory results. Although the 
arbitrary choice at 95% of the total variance is relatively high, any level is arbitrary. No 
matter what cumulative-percentage level is selected, this approach does not appear promising 
because there is the high risk that many of the components that are retained will summarize 
noise or non-trivial components will not be included. 

Bartlett's test of sphericity correctly identified the dimensionality in many of the data 
sets, but in some cases indicated up to 11 significant eigenvalues although only a single 
dimension was simulated (Table 4.3). Despite the statement by Kendall (1980) that this test 
is overly conservative when applied to correlation matrices, it appears to correctly identify the 
number of dimensions with many data sets, but it is too liberal a test with matrices having a 
low observation -to-variable ratio (e.g., less than the 3:1 ratio advocated by Grossman et al. 
1991). With the ecological data, the test also retained large numbers of the components, i.e., 
19 of 32 components were considered significant with the benthic invertebrate data. 
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Bartlett's approach to test for homogeneity of the correlation matrix (i.e., whether the 

first X equalled all others) appeared to identify the correct minimal dimensionality except with 
the 32-variable analyses. Here the method indicated significant structure with random, 
uncorrelated data. Likewise, Lawley's test consistently overestimated the dimensionality of 
the 12- and 32-variable matrices having uniform correlations. Because the test is designed to 
evaluate only whether A^ is the same as successive eigenvalues, the method is rather limited. 
As a result of its limited utility and relatively poor performance in this set of comparisons, 
the method is not recommended. 

The combination of testing for overlap in ranges of bootstrapped eigenvalues and for 
eigenvector coefficients differing from appears more promising. With simple matrices 
either lacking structure or having a single dimension, both approaches consistently revealed 
the underlying dimensionality of the simulation. However with patterned matrices, the results 
were less reliable for either approach individually. Both methods worked well with matrices 
based on 5-7 and S-IV which had strong inter-variable correlations. However, in S-II where 
there were three underlying, but weak dimensions, there were no differences between the 
bootstrapped eigenvalues. The eigenvector coefficient approach produced inconsistent results 
and frequently underestimated the correct dimensionality. With matrices from S-III, both 
methods correctly identified two dimensions, but from different replicated matrices. 

The poor showing of the eigenvector approach for S-// and 5-/// is easily explained 
and similar to situations discussed elsewhere (Oksanen 1988). In both simulations, three- 
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dimensional data sets were created by having three sets of 4 variables, each set having 

identical correlations. Due to this condition and chance selection of observations in the 
bootstrap, any specific dimension could be expressed on the first component of one PCA, but 
on the second or third component from another PCA. This is similar to the re-ordering of 
components or solution instability found by Oksanen (1988) with detrended correspondence 
analysis. When the order of expression of the underlying dimensions varies between 
components for different analyses, the eigenvector coefficient approach will fail. With these 
same data characteristics, the first three eigenvalues also overlap in their 95% confidence 
limits, but are significantly different from the fourth eigenvalue. The problem with the 
eigenvector coefficients is particularly evident when the initial correlation structure is weak 
(e.g. S-II). However, if the dimensions differ in: (1) the strength of correlation structure (i.e., 
several high correlations versus low or medium correlations; (2) the number of constituent 
variables; or (3) have strong correlations, this method provides more accurate results. 
Overall, it appears that the combination of these two approaches, Le., the bootstrapped 
eigenvalue and eigenvector coefficients, provides a better measure of the dimensionality than 
either approach alone. The maximum value obtained with either approach was close to the 
true dimensionality, except with the S~ll data. 

An additional consideration of the bootstrapped eigenvector method is that it assists 
with the evaluation of whether or not each variable contributes to a given component. If a 
specific variable is not significantly weighted on any component, then that variable could be 
removed from the analysis. For example, in the PCA of the lake morphology data, 3 



variables had eigenvector coefficients that differed from zero. However, lake volume 
coefficients included in the 95% confidence limits on each component. Therefore, lake 
volume did not contribute to the analysis and added little information to the PCA. 

With the use of any of these methods employing formal statistical tests, e.g., Bartlett's 
test of sphericity, it is important to recognize the increased probability of rejecting the null 
hypothesis when many components are evaluated sequentially. When such tests are used, 
researchers may remove this increased risk of a Type I error by employing some form of 
correction for multiple comparisons such as Bonferroni's adjustment. 

The most promising approaches to component evaluation are the broken-stick model 
and the bootstrapped eigenvalue -eigenvector method. The broken-stick approach has the 
advantage of being simple to calculate. However, within the scope of this study, both 
methods led to similar conclusions about the dimensionality of the simulated data sets. The 
matrices simulated in this study all represented relatively well-conditioned data (e.g., from a 
normal distribution, independent sampling). However, many data used in ecological studies 
do not meet formal assumptions of classical statistical approaches. The extension of this 
comparison to simulated data varying in departure from the statistical 'ideal' would be of 
considerable value (e.g., Davis 1977). Approaches such as the bootstrapped eigenvalue- 
eigenvector method would likely prove more useful with such data conditions than the 
relatively sensitive methods based on idealized distributions and formal tests (e.g. both of 
Bartlett's and Lawley's methods). 
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Chapter 5: FISH AND BENTHIC INVERTEBRATES: COMMUNITY 

CONCORDANCE AND COMMUNITY-ENVmONMENT RELATIONSHD7S 



ABSTRACT 

Fish and benthic invertebrates from 40 lakes in south-central Ontario showed significantly 
concordant patterns based on community structure. Fish communities were associated 
significantly with lake morphological characteristics, but were uncorrected with water 
chemistry. Large, deep lakes differed from shallow lakes in their fish species in having richer 
faunas due to the additional cold-water species. Centrarchid species occurred more frequently 
in small, shallow lakes than in larger lakes. The invertebrate community was not correlated 
with lake morphology, but showed a significant association with water chemistry, principally 
lake pH. A strong contrast in the abundance of Chaoborus and Holopedium existed, but it 
was unclear whether this was due to a predator-prey relationship or to differences in acid 
tolerance. Although the lakes showed similar patterns in the composition of both 
communities, each community was associated with a different set of structuring environmental 
factors. Biotic processes within and between communities explain this paradox in 
community-environment relationships. Such biotic interactions may involve direct processes 
as fish predation on a particular invertebrate taxon or indirect factors, e.g., fish limiting the 
abundance of invertebrate predators, thereby limiting the impact of these invertebrate 
predators. 
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INTRODUCTION 

Community ecologists typically study inter-specific relationships or the relationship 
between a group of species and their physical-chemical environment Such studies lead 
researchers to greater understanding of interspecific relationships, such as predation and 
competition, from which ideas can be formulated and tested experimentally. As well, 
relationships between communities and their environment provide strong evidence of the 
importance that various environmental factors have in determining species' distribution and 
abundance. Although recognition of the importance of community-level studies is increasing 
(e.g., Carpenter 1988a and references therein), there are remarkably few studies examining 
communities of very different trophic or taxonomic level from the same environment. Within 
aquatic environments, numerous studies have considered fish, benthic invertebrate, 
phytoplankton, or zooplankton communities. Such research has concentrated on: (1) a single 
community and the abiotic environment; (2) the effect one or a few species from one 
community has on another community; (3) the changes in biomass and production across 
different communities; and (4) changes in diversity in different communities due to 
environmental conditions. 

Overall, ecologists generally have not considered whether different communities show 
similar patterns across varied environments and whether environmental conditions affect 
different communities in a comparable manner. For example, there is evidence that structure 
in fish communities is due to characteristics of lake morphometry (Harvey 1978, 1981; Tonn 
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and Magnuson 1982; Jackson 1988), lake water chemistry (Harvey 1975; Rahel 1986; Somers 

and Harvey 1984; Kelso and Lipsit 1988), and predation (Harvey 1981; Tonn and Magnuson 
1982; Jackson 1988; Jackson et al. 1992). Similarly, researchers have concluded that 
zooplankton and benthic invertebrate communities are structured by different factors. Fish 
predation has been proposed as important in limiting the occurrence and abundance of large 
cladocerans, Chaoborus, odonates and various other invertebrates (e.g.. Brooks and Dodson 
1965; Morin 1984; Post and Cucin 1984; Butler 1989; McQueen et al. 1990, but see Thorp 
and Bergey 1981a,b). This imposes a 'top-down' control on the aquatic ecosystems (see 
Northcote [1988] and Carpenter [1988a] for reviews). In such cases, changes in the 
abundance of large cladocerans via predation are purported to lead to changes in the biomass 
of algae, i.e., large cladocerans are more efficient filter feeders and tend to limit algal 
biomass. Fish predating on Chaoborus, Mysis, or other predatory invertebrates may reduce 
these invertebrates' abundance, thereby minimizing the impact of these invertebrate predators 
on other zooplankton. These changes in zooplankton composition in turn contribute to 
changes in phytoplankton community structure. Similar changes in abundances of other 
invertebrate predators or herbivorous zooplankton may occur due to fish predation, thereby 
leading to altered ecosystems., 

Water chemistry, e.g. pH, phosphorus, has been suggested as being of primary 
importance in structuring invertebrate communities (Sprules 1975; Kilgour and Mackie 1991; 
Tessier and Horowitz 1990). Physiological differences in acid- tolerance may contribute to 
direct changes in invertebrate communities via toxicological effects or indirectly through 
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changes in food supply, predators, and competitors (e.g., Knoechel and Campbell 1988; Yan 

et al. 1988). Lakes with elevated nutrient levels may have greater productivity, possibly 

leading to anaerobic conditions in the hypolimnion of stratified lakes. Planktonic, benthic 

invertebrate, and fish communities may all be affected by low oxygen levels (e.g., Casselman 

and Harvey 1975; Sikorowa 1978). Additional factors contributing to invertebrate community 

structure and abundance include substrate (Allison and Harvey 1988) and the complexity of 

the environment (Lodge et al. 1988). Habitat complexity is often a function of macrophyte 

abundance and has been shown to contribute to increased abundance and size of invertebrates 

(Crowder and Cooper 1982; Beckett et al. 1992). 

More recently, experimental manipulation of lakes (e.g., Vanni 1988; Carpenter et al. 
1987) has been used to study and test relationships between different taxocene or trophic 
levels while negating criticisms regarding enclosure/exclosure studies. Given the scope of 
these lake-wide manipulations and inter-lake variability, replication amongst lakes is 
impossible. In addition, results from such manipulations may differ when lakes have different 
environmental conditions (e.g., eutrophic versus oligotrophic or acid versus circumneutral). 
Comparisons of different communities across many lakes differing in environmental 
conditions require large, survey-based studies where the variation in different environmental 
conditions can be examined relative to the different communities. 

Given the variation in importance attributed to different environmental factors in 
different systems having different species, the objective was to determine the relationship of 



the fish and benthic invertebrate communities to their chemical and morphological 
environments. Specifically, a test is done evaluating whether lake water chemistry or 
morphology shows a greater association to the structure of both communities and whether 
both communities exhibit similar relationships to these abiotic components. Additionally, a 
test of whether both communities exhibit similar patterns of lake association (i.e., do some 
lakes show similar relationships in both community analyses?) across the lake set is 
conducted. 

METHODS 

Data Collection 

Forty lakes from south-central Ontario (see Appendix) were sampled for lake water 
chemistry, lake morphology, and the abundance of fish and benthic invertebrates. Lakes were 
chosen to provide equal representation across a range of surface area (9-124 ha) and pH (5.1- 
6.9). Chemical concentrations of calcium, magnesium, sodium, potassium, chloride, sulphate, 
dissolved organic carbon (DOC), dissolved inorganic carbon (DIC), phosphorus, and total 
nitrates were determined by the Ontario Ministry of the Environment (Ontario Ministry of the 
Environment 1981). Lake pH was determined in the field. Lake area, volume, maximum 
depth, and total shoreline perimeter were obtained from the Ontario Ministry of the 
Environment (P.I Dillon, unpubl. data), the Ontario Ministry of Natural Resources (unpubl. 
data), and lake bathymetric maps. Although compound or ratio variables, e.g., lake volume 
development, lake area to watershed area, are often incorporated in such studies, they were 
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not included due to their statistical properties and confounding natures (e.g., Kenney 1982; 

Jackson et al. 1990; Jackson and Somers 1991). 

Fish species were sampled using fine- and coarse-meshed trapnets, baited minnow 
traps, clear plastic traps (Casselman and Harvey 1973), and multi-meshed gillnets. Species- 
abundance data from each sample were tabulated. Population estimates were calculated using 
mark-recapture methods in several lakes (Harvey and Lee 1981). Relative abundance 
estimates for each species from each sampling method were evaluated relative to Petersen 
population estimates. Despite intensive sampling (e.g., in excess of several hundred samples 
for some gears in some lakes), inconsistent rankings of abundance between gears within and 
across lakes for different species were found (Jackson, unpubl. data). As the relative 
abundance data proved to be an unreliable indicator of abundance, analyses were restricted to 
fish species' presence-absence data. For community analyses, species found in only a single 
lake were excluded as they contribute no information in across-lake comparisons. 

Benthic invertebrate data were sampled by SCUBA divers. Hand cores of 5.21 cm 
inside diameter were taken to a sediment depth of 10 cm at each of 10 locations along each 
of 6 transects in each lake (Allison and Harvey 1981; Collins et al. 1981). Transects were 
stratified relative to the proportion of each substrate type and stratum area following Allison 
and Harvey (1981). Each set of 10 cores was preserved immediately in 10% formalin. 
Invertebrates were separated from the substrate using a sucrose-floatation method (Allison and 
Harvey 1981) and generally identified to the familial or ordinal level to remain consistent 
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with previous studies (see Table 5.1; Allison and Harvey 1981; Allison and Harvey 1988). 

Nineteen of the 40 lakes were sampled in 1979 and 1980 for lake water chemistry, 
fish species, and benthic invertebrate abundance (Allison and Harvey 1981; Harvey and Lee 
1981). The remaining 21 lakes were sampled in 1989 for benthic invertebrates and water 
chemistry. The fish data for these 21 lakes were from Jackson (1988). 

Statistical Analysis 

Water chemistry and lake morphometry variables were transformed to approximate 
normal distributions and linearize bivariate relationships. Square-root transformations were 
used with potassium, sulphate, nitrate, phosphorus, lake surface area, volume, maximum 
depth, and perimeter. Magnesium concentration was arcsin transformed and the remaining 
chemistry variables, except pH, were log-transformed. 

Benthic data were first summarized within each lake. Various transformations of the 
benthic abundance data were made based on preliminary analysis using Taylor's Power Law. 
However, the log, (x+l) transformation was found to provide results comparable to those 
using Taylor's Power Law. Similar findings have been indicated by Green (1979). Lake 
means were calculated using the transformed data and formed the basis of subsequent 
multivariate analyses. 



Table 5.1. Codes for lake, fish, and invertebrate names. 
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Lake Code 


Lake 


Fish Code 


Fish Species 


Invertebrate 
Code 


Invertebrate 
Taxa 


Bass 


Basshaunt 


B 


Burbot 


Acar 


Acari 


Bent 


Bentshoe 


BBH 


Brown bullhead 


Bosm 


Bosminidae 


Bigw 


Bigwmd 


BM 


Brassy minnow 


Cala 


Calanoida 


B_Ch 


Blue Chalk 


BNTD 


Blacknose dace 


Cera 


Ceratopogonidae 


Buch 


Buchanan 


BNM 


Bluntnosc minnow 


Chaob 


Chaoborinae 


Cind 


Cinder 


BNS 


Blacknose shiner 


Chir 


Chironomidae 


Clay 


Clayton 


BS 


Brook stickleback 


Chyd 


Chydoridae 


Cross 


Crosson 


BT 


Brook trout 


Cycl 


Cyclopoida 


Dickie 


Dickie 


C 


Cisco 


Daph 


Daphniidae 


Gull 


Gullfeather 


cc 


Creek chub 


Elmi 


Elmidae 


Harv 


Harvey 


cs 


Common shiner 


Ephem 


Ephemeroptera 


Heen 


Heeoey 


FHM 


Fathead minnow 


Gast 


Gastropoda 


L_C1 


Littie Clear 


FSD 


Finescale dace 


Harp 


Harpacticoida 


L_F1 


Lower Fletcher 


GS 


Golden shiner 


Holo 


Holopedidae 


L_Wr 


Little Wren 


ID 


Iowa darter 


Macro 


Macrothricidae 


McKe 


McKeown 


LC 


Lake chub 


Misc 


Miscellaneous 


Plas 


Plastic 


LMB 


Largemouth bass 


Neur 


Neuroptera 


Poor 


Poorhouse 


LT 


Lake trout 


Neml 


Nemata 


Pore 


Porcupine 


NRD 


Northern redbelly dace 


Odon 


Odonata 


R_Ch 


Red Chalk 


PD 


Pearl dace 


Olig 


Oligochaeta 


Rido 


Ridout 


PKS 


Pumpkinseed 


Ostr 


Ostracoda 


Soli 


Solitaire 


RB 


Rock bass 


Pele 


Pelecypoda 


Teap 


Teapot 


RT 


Rainbow trout 


Sidi 


Sididae 


Thre 


Three Island 


SMB 


Smallmouth bass 


Talt 


Taltridae 


Trou 


Troutspawn 


SS 


Slimy sculpui 


Trie 


Trichoptera 


Walk 


Walker 


WS 
YP 


White sucker 
Yellow perch 
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Principal components analysis (PCA) was used on correlation matrices of the water 

chemistry and lake morphometry data to summarize inter-lake patterns in both data sets. 
Both the fish community data and the benthic invertebrate data sets were analyzed using 
correspondence analysis (CA). The PCA and CA summarized the maximum amount of 
variation in each data set permitting assessments of community structure and inter-lake 
patterns. 

Both the fish and invertebrate data sets were analyzed using canonical correspondence 
analysis (CCA - ter Braak 1986) with the lake water chemistry and morphology data sets. 
This multivariate method summarizes the maximum amount of variation in one of the biotic 
data sets while constraining it to be associated with axes based on linear composites of the 
environmental data. For example, relationships between the invertebrate taxa are summarized 
such that the community relationships and the gradients in water chemistry are maximally 
correlated. The method permits a multivariate direct-gradient analysis of the community and 
environment. 

Multivariate summaries of the benthic invertebrate and fish communities were 
compared using PROTEST (Jackson 1992a). PROTEST is a test of matrix concordance 
which incorporates Procrustean matrix rotation (Gower 1971,1975; Digby and Kempton 1987; 
Rohlf 1990) where translation, rotation, reflection, and dilation fit one matrix to another 
matrix. In this example, the method tries to match the position of each lake in a multivariate 
space defined by the CA of the invertebrate data to the position of the same lake in the space 
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defined by the CA of the fish species, thereby assessing the degree to which both 

communities have similar inter-lake patterns. The method minimizes the sum-of-the-squared 
deviations (i.e., m 2 ; Gower 1975) between the pair of points representing each lake such that 
the greater the similarity of the multivariate configurations from the data sets, the lower the 
m 2 value. This measure is compared to that derived from repeatedly randomizing the 
configuration from one matrix and recalculating the m 2 . The percentage of m 2 values equal to 
or less than the observed m 2 provides the significance level of the test. 

RESULTS 

Lake Water Chemistry and Morphology 

The PCA of lake water chemistry contained a single non-trivial component (Chapter 4 
and Jackson 1992b) summarizing 41% of overall the variation. Overall, this axis represents 
an ionic gradient dominated by the pH, cation, and conductivity variables ranging from low 
values at the left end (e.g., Herb, Plastic, Clayton Lakes; Fig. 5.1) to the higher pH-cation 
lakes (Basshaunt, Poorhouse) at the right side of the plot Three Island Lake was an 
exception to this general gradient as it is positioned with the low pH lakes due to low cation 
concentrations, despite a pH of 6.1. 

A size gradient was found in the PCA of lake morphology (Fig. 5.2). Along the first 
axis, lakes ranged from being small and relatively shallow (Jill, Clayton, Buchanan) to lakes 
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that have larger surface areas and greater depths (Solitaire, Bear, Bigwind). This axis 

accounted for 76% of the observed variation in lake morphometry. The second axis was also 
significant (Jackson 1992b) and summarized 19% of the overall variation. This axis can be 
interpreted as ordering lakes on the basis of depth relative to surface area. Louie Lake is 
exceptionally deep (43 m) for its surface area (31 ha) relative to Fawn (8 m and 87 ha) or 
Dickie Lake (12 m and 93 ha). 

Fish Community 

The correspondence analysis of the fish community summarized 42% of the total 
variance in the first four axes (Table 5.2). Plots of the species composition contrast small- 
bodied fishes, principally cyprinids, on the left side of the first axis to larger species, 
predominantly centrarchids, which are positioned on the right (Fig. 5.3). No discernible 
pattern in composition was interpreted from the second axis. On the graph, a species is 
positioned at the centroid of the coordinates from the corresponding plot of lakes (i.e., Fig. 
5.4) that contain the species. For example, fathead minnow occurs in Three Island, 
Troutspawn, Poorhouse, Solitaire, Bigwind, Buchanan, and Poorhouse Lakes. Therefore, it is 
plotted at the centroid of the CA coordinates for these lakes. If one considers a line passing 
through the origin, then species located at the same end of the line are found together 
frequently, whereas species located at opposite ends are found together rarely, if at all. 
Similarly, lakes positioned at opposite ends contain distinctly different fauna (Fig. 5.4). For 
example, opposite the fathead minnow is the cisco which occurs in Teapot and Harp Lakes. 
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Fathead minnow and cisco do not co-occur in this lake set and the lakes containing fathead 

minnow have a different fish fauna from those lakes containing cisco. Rarely occurring 
species (e.g., cisco, slimy sculpin) are positioned farther away from the graph origin, as are 
the lakes containing these rare species (e.g., Teapot, Harp). 

Fish - Water Chemistry 

In the correspondence analysis of the fish species constrained by water chemistry (i.e., 
CCA; Fig. 5.5), the species show patterns similar to the fish ordination with most small fishes 
positioned on the left-most side of the plot. There are minor re-arrangements in the relative 
positions of species. Species such Iowa darter, finescale dace, and many other cyprinid 
species have positions indicating increased occurrence in lakes having higher pH and cation 
concentrations (i.e., the pH vector is oriented towards the coordinates of these species 
indicating increasing pH levels in lake containing those species. Alternatively, species or 
lakes positioned on the opposite side of the origin represent those species found in higher-acid 
lakes). In contrast, species such as rock bass, brown bullhead, and largemouth bass show 
more frequent occurrence in more acid, higher DOC waters. As vectors connecting lake 
chub, brook stickleback, fathead minnow and cisco to the origin are orthogonal with the pH 
vector, these species show little or no difference in their occurrence along the pH gradient. 
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Table 5.2. Eigenvalues from principal components analysis of lake water chemistry and 
morphology, correspondence analysis of benthic invertebrate and fish community, and 
canonical correspondence analysis of communities and environmental data. 

Data Set Eigenvalues 

Axis I Axis II Axis HI Axis IV Total 

PCA Chemistry 4.879 2.3% 1.497 1.153 12.000 

PCA Morphology 3.029 .753 191 .027 4.000 

CAFish .272 .217 .181 .179 2.032 

CCA Fish ■ Chemistry .144 .115 101 .078 2.032 

CCA Fish - Morphology .121 .073 .034 .018 2.032 

CA Invertebrate .067 .045 .041 .033 .336 

CCA Invertebrate - Chemistry .042 .019 .017 .012 .336 

CCA Invertebrate - Morphology .019 .013 .008 .004 .336 
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Figure 5.1. Components I and II from a principal components analysis of lake water 
chemistry'. The first component shows increasing pH and cation concentrations towards the 
right. 
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Figure 5.2. Components I and II from a principal components analysis of lake morphology. 
First axis represents a general lake-size gradient increasing in size from left to right. The 
second component is interpreted as contrasting lakes of increasing depth relative to surface 
area from the bottom to top of graph. 
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Figure 5.3. The association of fish species from a correspondence anaiysis of fish presence- 
absence across lakes. Species codes found in Table 5.1. 



140 



Three Island 



Clayton 
-1.1.3.6 



Troutspawn 



Buchanan 



Axis I 
3.0 - 



Plastic 



oigwma 

McKe But* 
• Dan n . , 
Cross DcK 

Clear WrenBeni Gull 



Chutj 
Porcupine Leech 



Jill 



Axis 



-3.0 



Lower Fletcher 

Cinder 

Louie L Q 
Son B^Ch 

Harvey 



£2- 



Bear 



Wafxer 



30 



Red Chalk 



Teapc* 



Harp_ J2 Q 
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Lakes are positioned along a chemistry continuum with one extreme being a set of 

lakes showing high pH, Mg, K, DIC and conductivity, but low levels of dissolved organic 
carbon, nitrates, and phosphorus (Fig. 5.6). This group includes lakes such as Red Chalk, 
Blue Chalk, Louie, Clear, and others positioned in the lower left quadrant. This set of lakes 
contrasts with other, more dystrophic lakes (e.g., Herb, Dan, Leech). All lakes do not follow 
such a simple pH-cation versus DOC gradient. Poorhouse Lake is one of the highest pH 
lakes, but is positioned in the upper-left quadrant as is Clayton Lake whereas the latter is one 
of the more acid- stressed clearwater systems. 

The first and second axes of the water chemistry ordination explain 20% and 16% of 
the joint variation with the fish-species ordination, but the first canonical axes of the fish 
community and water chemistry are not significantly associated (P = .50). 

Fish - Morphology 

A canonical correspondence analysis of fish species and lake morphometry shows a 
strong relationship between species such as lake trout, rainbow trout, cisco, burbot, and slimy 
sculpin with large, deep lakes (Fig. 5.7). Smaller, shallower lakes tended to include 
largemouth bass, rock bass, and brown bullhead. Several species, including smallmouth bass, 
golden shiner, pumpkinseed, yellow perch, white sucker, creek chub, northern rcdbelly dace, 
and Iowa darter, exhibit equal tendencies to occur in small and shallow or large and deep 
lakes. 
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Figure 5.5. The association of fish species and water chemistry from a canonical 
correspondence analysis of fish and water chemistry across the lakes. 
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Figure 5.6. The association of lakes based on a canonical correspondence analysis of fish 
species and lake water chemistry. 
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Most lakes are easily categorized along this axis of depth and lake size (Fig. 5.8). 

Red Chalk, Harp, Blue Chalk, Clear, Bigwind and Louie are the large, deep lakes. Cinder 
and Bear lakes are somewhat atypical in that they are large, deep lakes, but they have 
extremely high lake perimeters relative to other lakes. The small, shallow, polymictic lakes 
include Leech, Jill, Dickie, and Fawn. Due to their relative shallow depths, these lakes 
typically do not stratify and have isothermal profiles during the late summer. Plastic lake is 
positioned with this group, but is a deep enough to stratify thermally. Many lakes (e.g., Herb, 
Solitaire) are positioned orthogonal to the general lake-size and depth axis and do not show 
similar trends in lake morphometry and fish species composition. The first two canonical 
axes account for 49% and 30% of the explained joint variance in fish species and lake 
morphometry. A significant relationship (P = 0.03) existed between the canonical axes of fish 
species composition and lake morphometry. 

Invertebrate Community 

The first four axes from a correspondence analysis of benthic invertebrate abundances 
accounted for 55% of the total variance (Table 5.2). The first axis (20% of the total variance) 
is dominated by the separation of Chaoborus from the remaining taxa. Chaohorus is 
positioned to the extreme right whereas several species (e.g., Hotopedium) are located at the 
left end (Fig. 5.9). Increased occurrence and abundance of Chaoborus is associated with 
decreased occurrence and abundance of Holopedium, Pelecypoda, and Daphniidae. 
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Figure 5.7. The association offish species and lake morphometry from a canonical 
correspondence analysis of fish and lake morphometry across the lakes. 
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Figure 5.8. The association of lakes based on a canonical correspondence analysis of fish 
species and lake morphometry. 
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Figure 5,9. The association of benthic invertebrates from a correspondence analysis of 
invertebrate abundance across lakes. 



148 

As Chaoborus and Bosminidae are positioned at right angles to one another, the occurrence 

and abundance of these two taxa appear to vary independently of one another. In general, the 
second axis (13.5%) contrasts Bosminidae to the remaining taxa, particularly the Odonata, 
Macrothricidae, and Taltridae. 

The plot of the lake ordination (Fig. 5.10) shows Fawn Lake at the extreme right end, 
followed by Dan, Poorhouse and McKeown Lakes. Overall, the only distinguishing trend 
along this axis is an increasing predominance of dystrophic lakes as one moves from left to 
right along the first axis. There are no apparent trends in the relative positioning of lakes 
along the second axis. 

Invertebrates - Water Chemistry 

The chemistry-constrained ordination of the invertebrates led to a similar 
representation of the taxa as that found in the ordination of the invertebrates alone (Fig. 5.1 1). 
The first axis retains the contrast between Chaoborus and Holopedium whereas the second 
axis shows extremes of Bosminidae to Odonata. The first axis approximates a gradient of 
pH, calcium and dissolved organic carbon with the position of Chaoborus indicating a greater 
abundance at lower pH and lower calcium concentrations, but higher dissolved organic 
carbon. 

The lake ordination has Fawn and Cinder Lakes distinct from the remainder of the 
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lakes (Fig. 5.12). Most lakes are clustered together in the central portion of the graph. When 

lake water chemistry is considered with the invertebrates, a pH gradient is found to extend 
from high pH lakes (e.g.. Blue Chalk and Red Chalk) to lakes of lower pH (e.g., Fawn, 
McKeown). The canonical axis, dominated by the pH gradient, is associated significantly (P 
< .01) with the first axis from the benthic invertebrates. This axis explains 34% of the joint 
variation in the invertebrate and chemistry data indicating close correspondence between the 
invertebrate biota and their chemical environment. Although the second canonical axis 
explained 18% of the joint variation in invertebrates and chemistry, it was considered 
uninterpretable as only a single significant chemistry component existed. 

Invertebrates - Morphology 

In contrast to previous results, the analysis of invertebrates and morphology led to 
Odonata being distinguished from the remaining taxa (Fig. 5.13). Chaoborus remains 
separate, but is not as atypical as found in the invertebrate or invertebrate-chemistry analyses. 
The position of Odonata, Macrothricidae, and Ephemeroptera indicates that their greatest 
abundance occurs in the small, shallow lakes such as Dan and McKeown Lakes. Gastropoda, 
Harpacticoida, and Oligochaeta occur more frequently in the larger, deeper 
lakes. Most of the taxa are clustered around the origin indicating no strong affinity to 
extreme lake morphological conditions. Chaoborus'' outlying position is at right angles to the 
morphological gradient, therefore there is little variation in Chaoborus' abundance relative to 
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Figure 5.10. The association of lakes based on a correspondence analysis of the benthic 
invertebrate abundance. 
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Figure 5.11. The association of benthic invertebrates and water chemistry from a canonical 
correspondence analysis of invertebrate abundance and water chemistry across the lakes. 
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lake morphology. Chaoborus is most abundant in Walker, Ridout, and Shoe Lakes (Figs. 

5.13 and 5.14). Overall, the relationship between the invertebrate and morphological 
canonical axes was non-significant (P = 0.17). 

Comparison of Fish - Invertebrate Communities 

Four CA axes from each of the fish and invertebrate ordinations were used in the 
PROTEST analysis. These axes were derived as weighted mean species scores so that the 
scaling represented an Eucbdean distance plot. The test showed a significant concordance 
between the two matrices (P < .02) indicating that the arrangements of lakes in each 
ordination space are more similar than expected due to random chance. The inclusion of 
additional CA axes resulted in similar conclusions, but fewer axes yielded probabilities 
greater than 0.10. 
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Figure 5.12. The association of lakes based on a canonical correspondence analysis of 
invertebrate species and lake water chemistry. 
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Figure 5.13. The association of benthic invertebrates and lake morphology from a canonical 
correspondence analysis of invertebrate abundance and lake morphology. 
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Figure 5.14. The association of lakes based on a canonical correspondence analysis of 
invertebrate abundance and lake morphology. 
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DISCUSSION 

The fact that the lakes exhibit similar patterns based on their fish and benthic 
invertebrate communities may not be surprising. Considerable research has shown that fish 
can alter both the benthic and nektonic communities. Many studies have been based on 
enclosures where fish are held or exclosures where fish are excluded, but the treatment is 
assumed to have no direct effect on the benthic community. Several of these cage and whole- 
lake studies have documented changes in the composition of the invertebrates (Zaret and 
Paine 1973; Mills and Schiavone 1982; Magnan 1988; Butler 1989), reductions in the average 
size of the invertebrates (Morin 1984; Post and Cucin 1984), or reductions in the standing 
biomass but increased production of invertebrates (Post and Cucin 1984). There are other 
studies demonstrating a lack of effect (e.g., Thorp and Bergey 1981a). Results from 
enclosure/exclosure studies have been questioned regarding methodological problems (e.g., 
biofouling Thorp and Bergey 1981a). As well, the spatial and temporal scale of these studies 
has been criticized as ecological processes may differ at larger scales due to environmental 
heterogeneity and prey refuge, and seasonal effects may differ from those found in short-term 
enclosure experiments (Wellboum and Robinson 1991; Carpenter 1988b; Stein et al. 1988), 

In what appears rather paradoxical, this study shows similar structure between the fish 
and benthic invertebrate communities across the lakes, but different relationships for each 
taxonomic community and the lake environmental conditions. Fish communities appear to be 
structured more by the morphological characteristics of the lakes whereas the invertebrate 
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results indicate that the chemical environment is more critical. The importance of lake 

morphology, particularly depth, is well recognized with fish communities (Harvey 1978; Tonn 

and Magnuson 1982; Johnson et al. 1977; Jackson 1988). Many species require cold-water 

refuges during summer, unless lakes are sufficiently deep and productivity too low to cause 

anaerobic conditions below the epilimnion, these species are excluded from the community. 

At the latitude and elevation of this study, summer epilimnial temperatures may exceed the 

upper lethal temperatures of cold-water species (trout, char, ciscoes) and approach those of 

cool-water species (e.g., white sucker). 

During winter, low oxygen levels may develop under the ice-cover of shallow lakes 
especially. Fish species differ in their tolerance to, and ability to find refuges from, such 
low-oxygen conditions. Species such as centrarchids and esocids show limited tolerance 
relative to cyprinid and ictalurid species. Thus, there may be community-wide selection of 
species based on their oxygen requirements (Jackson and Harvey 1989). In this study, the 
centrarchids were found to be associated with the smaller, shallower lakes (Fig. 5.7). In 
general, the centrarchids are intolerant of low oxygen levels. Therefore, it can be inferred 
that winter oxygen conditions do not appear to be of major importance across all the lakes, 
but low levels may be important in specific lakes, e.g.. Three Island (Jackson 1988). The 
environmental condition contributing the most to fish community structure is likely lake depth 
and the associated thermal stratification permitting survival of cold-water species such as lake 
trout, rainbow trout, and cisco during the summer. Thus depth and the related temperature 
and oxygen factors provide selective forces acting on the community throughout much of the 
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year. 

The effect of water chemistry cannot be ruled out Within specific lakes, it may be 
important in limiting species composition. Lake pH has been shown to significantly affect the 
occurrence of fish species (Beamish and Harvey 1972; Somers and Harvey 1984; Rahel 
1986). Within this lake set, some species may be limited by lake pH. For example, in 1979 
Plastic Lake had fathead minnow, and small populations of yellow perch and rock bass. 
When intensively sampled during 1984 (Jackson and Harvey, unpubl. data), fathead minnow 
was not found, yellow perch remained limited in numbers, but the rock bass population had 
increased. Population estimates from 1987 showed yellow perch had increased to between 
60,000-140,000 individuals (D. McQueen, York, Univ. pers. comm.) from the 200 reported by 
Harvey and Lee (1981). Dillon et al. (1987) showed a steady decline in the Plastic Lake's 
alkalinity during this time, contributing to increased spring pH depression and fish kills 
(Harvey 1989). Thus changes in the abundance of fish populations and the loss of fathead 
minnow paralleled changes in the water chemistry of Plastic Lake. 

The effect of water chemistry was represented more strongly in the benthic 
invertebrate communities where a significant relationship was found between the community 
composition and a pH-dominated environmental gradient. Although numerous studies of 
stream benthos have clearly documented the relationship between pH and community 
composition (Wright et al. 1984; Ormerod and Edwards 1987; Rutt et al. 1989,1990), results 
vary amongst lake studies. Several have shown a reduction in taxonomic richness in acidified 
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lakes, e.g., gastropods and pelecypods (Hagen and Langeland 1973; Roff and Kwiatkowski 

1977; Collins et al. 1981; Harvey and McArdle 1986), macrothricids (0kland 1969; Harvey 
and McArdle 1986), Mysis relicta (Nero and Schindler 1983) and decapods (France 1984). 
Reductions in the biomass or density of ephemeropterans, odonates, trichopterans, 
neuropterans, and plecopterans have all been found in acid-stressed lakes (Hagen and 
Langeland 1973; Hcndrey et al. 1976; Grahn et al. 1974; Hagen and Langeland 1973; 
although Harvey and Mcardle f 1986] found no reductions in these taxa). 

Given the considerable variation and conflicting results between studies, the 
relationships between invertebrates and pH are unclear. Some of this variability may be 
attributed to the concentration and bioavailability of toxic metals, most especially aluminum 
(Stokes et al. 1989). Simultaneous changes occurring at different trophic levels within the 
lakes may also confound the relationship. Taxa such as fish and crayfish may limit the 
abundance of various benthic organisms through predation. As fish, crayfish, and additional 
predators are often intolerant of acid conditions, these species may be reduced or eliminated, 
thereby reducing predation pressure on more acid-tolerant taxa and the numbers of prey may 
increase. This indirect effect of lake acidification has the potential to alter considerably the 
dynamics of ecosystems. Reductions in fish predation may lead to increased predation by 
Chaoborus, thereby altering the cladocerans and, subsequently, additional zooplankton and 
phytoplankton species via competition. Clearly, the complexity of these interactions 
complicates any direct assessment of the relationship between pH and the benthic community. 
Although the literature generally agrees that changes in the benthos occur, it is likely that 



160 

some of the changes may be indirect rather than strictly due to the toxicity of hydrogen ion 

per se. 

Ckaoborus abundance was associated with more acid lakes, but negatively correlated 
with Holopedium. Yan et al. (1985) found Chaoborus abundance was greatest in acid lakes. 
Yan et al. (1988) showed a negative relationship between Daphnia spp. and Holopedium, but 
they could not determine whether competition, differential predation, or pH tolerance was 
more important in the negative correlation between the abundances of these taxa. Results 
show both taxa to have similar occurrences and both are correlated negatively with 
Chaoborus abundance. Whether this pattern depends on a lesser tolerance to acid by 
Daphniidae and Holopedium relative to Chaoborus or a direct predator-prey effect between 
these taxa is unclear. The results show that lakes with circumneutral pH had increased 
abundances of Holopedium, Pelecypoda, Ephemeroptera, Neuroptera, Harpacticoida, and 
Nemata. Lakes with lower pH and higher DOC conditions had greater concentrations of 
Chaoborus, Sididae, and Elmidae relative to higher pH lakes. The most abundant taxon was 
Chironomidae which was ubiquitous and showed no trend in abundance or occurrence relative 
to lake acidity. Similar results were found by Harvey and McArdle (1986). 

It is possible that the lack of a significant relationship between the invertebrate 
community and lake morphometry may represent a true lack of association in nature. Many 
benthic invertebrates are substrate-specific (Beckett et al. 1992) and Patalas (1971) suggested 
that zooplankton communities are structured according to lake morphometry. Substrate 
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heterogeneity within each lake may be sufficient to provide the required physical habitat 

necessary for each taxon. Given that most major forms of substrate are found in most of 
these lakes, this is a reasonable assumption. If so, the abundance of the taxon may depend on 
either the chemical or bioric environment An alternative explanation is that the level of 
taxonomic resolution may not be sufficient to distinguish between species within a single 
family or order which segregate at specific depths. Many of the taxa are restricted to or more 
abundant in the littoral zone. For these taxa, it is likely that differences in taxonomic 
resolution would not change relationships. However, some taxa, e.g. Chironomidae, are 
present at various depths and may show taxonomic relationships relative to depth. For such 
taxa, more detailed taxonomic identification may provide significant associations with lake 
morphometry. Unfortunately, the rarity of many taxa would preclude such analyses unless 
large areas of substrate at different depths were sampled. Warwick (1988) has shown that 
identification at the specific and ordinal level led to similar results in marine benthic 
community analyses. However, additional research is warranted on what importance this 
taxonomic resolution has on the interpretation of freshwater community-environment 
relationships. 

The paradox of a concordance between the fish and invertebrate communities, but 
different relationships between the communities and me environment is due likely to an 
interaction between the biotic and abiotic environments. The fish community is structured 
primarily on the basis of the physical environment of the lake. Conditions necessary for 
spawning, e.g., shoals, may only be present in larger lakes as well as cold-water refuges for 
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various species. Within a general lake morphotype, chemical conditions like pH and 

dissolved oxygen are likely important in determining the viability of certain species, e.g., 
fathead minnow, smallmouth bass. These particular factors contribute to determining the fish 
species which can survive within a lake. In addition to these factors, the abundance of fishes 
will be influenced by lake productivity via invertebrate productivity. 

The invertebrate community is structured by the chemical portion of the abiotic 
environment. In conjunction with the influence of chemistry, the invertebrates are impacted 
by the fish community. The interactions between the fish and invertebrates is likely complex 
due to a variety of direct and indirect effects (sensu Kerfoot and Sih 1987). Some 
invertebrate taxa may be limited in abundance due to direct predation by fish or show 
increased abundance due to reduced invertebrate predation mediated by fish. Given the 
network of possible interactions between these two communities, it is difficult to predict the 
responses of many invertebrate taxa to changes in their chemical environment. The 
relationship of some invertebrate taxa, e.g., cladocerans, to different fish communities is 
predictable and similar patterns result from surveys across lakes or from experimental 
manipulations. However, the response of many benthic taxa may be species specific as 
shown by changes in size structure of invertebrate communities (Post and Cucin 1984). A 
general shift to smaller taxa, reduced biomass, and increased rates of production is to be 
expected with increased predation by fish, but our understanding remains insufficient to 
permit more detailed predictions about benthic-community structure in response to changes in 
the fish community. 
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SUMMARY AND CONCLUSIONS 

From several chapters, it is apparent that decisions made regarding the form of data 
analysis can have considerable influence on the resultant representation of community and 
environmental patterns. However, it is not so simple as to define such decisions as right or 
wrong. Many of these choices are made in order to emphasize or de-emphasize some 
attributes of the data. Some of these choices are made when deciding about whether absolute 
or relative species abundance is more important {e.g., Euclidean versus chi-squared distances) 
or whether all variables should be considered of equal importance or their importance should 
be weighted by their magnitude of variability (i.e., correlation versus variance-covariance 
matrices). The combined influence of these various decisions has been neglected largely in 
past, particularly in the aquatic sciences. However, as shown in this study, such decisions 
regarding the data analysis may contribute to very different conclusions about community and 
environmental patterns. Researchers are cautioned to recognize a priori the consequences 
such decisions may have before proceeding in accordance to the attributes desired. 

One of the more common approaches to data analysis is through the use of compound 
(e.g., ratio or product) or compositional (i.e., frequency, percentage, proportion) variables. 
Although such variables often are intuitively desirable, the covariance structure between 
variables may contribute to confusing, rather than clarifying, relationships. The hypotheses of 
interest must be considered carefully and the appropriate model chosen. Frequently the use of 
ratios in statistical models may lead to the testing of inappropriate null hypotheses. In many 
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situations the use of linear-based procedures, such as regression or analysis of covariance, 

provides a better and more powerful analytical approach than ratios. 

SignificanUy concordant patterns in the fish and benthic invertebrate communities were 
found amongst lakes. Despite the considerable differences in taxonomy and ecology of these 
organisms, they were found to exhibit similar inter-lake patterns. The relative importance of 
the environmental conditions in influencing the structure and composition of the communities 
varied greatly between the fishes and invertebrates. Fish communities were associated with 
lake morphology, but not water chemistry. Morphological conditions related to lake depth are 
likely of principal importance in determining species composition. Deeper lakes have a 
reduced risk of winterkill. The development of thermal stratification and a well-oxygenated 
hypolimnion may represent the most important single factor influencing fish community 
composition. This environmental condition permits the survival of many cool- and cold-water 
fishes not present in smaller, shallower lakes. Although water chemistry is known to be 
important in determining the survival of some fish species (e.g. pH and fathead minnow), it 
did not appear to be as important as lake morphology. This may be due to the limited 
number of low pH lakes included in the study. 

In contrast, the invertebrates showed a greater correspondence to the chemical rather 
than the morphological environment. This may be due to a greater sensitivity to acid or low 
calcium waters by some taxa (e.g., mollusca) than found in the fishes. It may be that the 
coarser taxonomic resolution of the benthos precluded identifying taxa specific to epilimnetic 
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or hypolimnetic waters as found for fishes. The importance of this taxonomic scaling 

requires additional study. Alternatively, it may be that sufficient morphological variation 
occurred in most or all lakes to provide enough heterogeneity in the physical environment 
(e.g., substrate). If most physical environments are present in all lakes, then the chemical 
environment may provide the varied and determining force. 

Undoubtedly there are influences on each community imposed by the other 
community. Such interactions form the basis of 'top-down' versus 'bottom-up' debates. 
Although this study shows concordance between the two communities, it cannot assess 
causality. Such information may be obtained more readily via experimental studies within 
lakes selected from this study. The use of studies similar to the present one will allow lakes 
to be identified containing similar environmental conditions, but different communities; 
different environmental conditions, but similar communities; or different fish and benthic 
communities. The ability to categorize lakes on the basis of these conditions will aid 
researchers in choosing lakes for experimental studies. 

Community ecology is progressing rapidly in terms of quantitative approaches. The 
inherent multivariate, and frequently nonlinear, relationships of organisms and their 
environments have proven a considerable challenge to ecologists. Community ecology has 
recently been advancing from simple descriptive studies to those testing formal hypotheses of 
community structure and community-environmental relationships. Recently developed 
methods (e.g. canonical correspondence analysis, PROTEST) hold promise not only for 
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summarizing patterns, but as means of developing predictive relationships between the 

community and the environment Such developments will help ecologists better understand 
biotic-abiotic interactions and predict alterations of the biota subject to environmental changes 
(e.g., acidification, climate change). 
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APPENDIX I: FISH SPECIES COMMON AND SCIENTIFIC NAMES 



Family 



Scientific Name 



Common Name 



Salmonidac 



Osmeridae 
Cyipinidae 



Catostomidae 

Ictaluridae 

Gadidae 

Gastcrosteidae 

Centrarchidae 



Percidae 



Cottidae 



Oncorkynchus gairdneri 
Salveiinus namaycusk 
Salvelinus fontinahs 
Coregonus artedii 
Osmerus mordax 
Phoxinus eos 
Phoxinus neogaeus 
Couesius plumbeus 
Hybognathus hantinsoni 
Notemigonus crysoleucas 
Notropis comutus 
Notropis heterodon 
Notropis heterolepis 
Pimephales notatus 
Pimephales promelas 
Rhinichthys atratulus 
Semotilus airomaculatus 
Semotilus margarita 
Calostomus commersoni 
Ictalurus nebuiosus 
Lota lota 

Culaea inconstant 
Pungitius pungitius 
Ambloplites rupestris 
Lepomis gibbosus 
Micropterus dolomieui 
Micropterus salmoides 
Perca flavescens 
Etheostoma exile 
Cottus cognatus 



Rainbow (rout 
Lake trout 
Brook trout 
Cisco 

Rainbow smelt 
Northern redbclly dace 
Finescale dace 
Lake chub 
Brassy minnow 
Golden shiner 
Common shiner 
Blackchin shiner 
Blacknose shiner 
Blunt nose minnow 
Fathead minnow 
Blacknose dace 
Creek chub 
Pearl dace 
White sucker 
Brown bullhead 
Burbot 

Brook stickleback 
Ninespine stickleback 
Rock bass 
Pumpkinseed 
Smallmoutii bass 
Largemoutb bass 
Yellow perch 
Iowa darter 
Slimy sculpin 
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